altEngine
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends Macros
stb_image.h
Go to the documentation of this file.
1 /* stb_image - v2.13 - public domain image loader - http://nothings.org/stb_image.h
2 no warranty implied; use at your own risk
3 
4 Do this:
5 #define STB_IMAGE_IMPLEMENTATION
6 before you include this file in *one* C or C++ file to create the implementation.
7 
8 // i.e. it should look like this:
9 #include ...
10 #include ...
11 #include ...
12 #define STB_IMAGE_IMPLEMENTATION
13 #include "stb_image.h"
14 
15 You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16 And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17 
18 
19 QUICK NOTES:
20 Primarily of interest to game developers and other people who can
21 avoid problematic images and only need the trivial interface
22 
23 JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24 PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25 
26 TGA (not sure what subset, if a subset)
27 BMP non-1bpp, non-RLE
28 PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29 
30 GIF (*comp always reports as 4-channel)
31 HDR (radiance rgbE format)
32 PIC (Softimage PIC)
33 PNM (PPM and PGM binary only)
34 
35 Animated GIF still needs a proper API, but here's one way to do it:
36 http://gist.github.com/urraka/685d9a6340b26b830d49
37 
38 - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39 - decode from arbitrary I/O callbacks
40 - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41 
42 Full documentation under "DOCUMENTATION" below.
43 
44 
45 Revision 2.00 release notes:
46 
47 - Progressive JPEG is now supported.
48 
49 - PPM and PGM binary formats are now supported, thanks to Ken Miller.
50 
51 - x86 platforms now make use of SSE2 SIMD instructions for
52 JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53 This work was done by Fabian "ryg" Giesen. SSE2 is used by
54 default, but NEON must be enabled explicitly; see docs.
55 
56 With other JPEG optimizations included in this version, we see
57 2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58 on a JPEG on an ARM machine, relative to previous versions of this
59 library. The same results will not obtain for all JPGs and for all
60 x86/ARM machines. (Note that progressive JPEGs are significantly
61 slower to decode than regular JPEGs.) This doesn't mean that this
62 is the fastest JPEG decoder in the land; rather, it brings it
63 closer to parity with standard libraries. If you want the fastest
64 decode, look elsewhere. (See "Philosophy" section of docs below.)
65 
66 See final bullet items below for more info on SIMD.
67 
68 - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69 the memory allocator. Unlike other STBI libraries, these macros don't
70 support a context parameter, so if you need to pass a context in to
71 the allocator, you'll have to store it in a global or a thread-local
72 variable.
73 
74 - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75 STBI_NO_LINEAR.
76 STBI_NO_HDR: suppress implementation of .hdr reader format
77 STBI_NO_LINEAR: suppress high-dynamic-range light-linear float API
78 
79 - You can suppress implementation of any of the decoders to reduce
80 your code footprint by #defining one or more of the following
81 symbols before creating the implementation.
82 
83 STBI_NO_JPEG
84 STBI_NO_PNG
85 STBI_NO_BMP
86 STBI_NO_PSD
87 STBI_NO_TGA
88 STBI_NO_GIF
89 STBI_NO_HDR
90 STBI_NO_PIC
91 STBI_NO_PNM (.ppm and .pgm)
92 
93 - You can request *only* certain decoders and suppress all other ones
94 (this will be more forward-compatible, as addition of new decoders
95 doesn't require you to disable them explicitly):
96 
97 STBI_ONLY_JPEG
98 STBI_ONLY_PNG
99 STBI_ONLY_BMP
100 STBI_ONLY_PSD
101 STBI_ONLY_TGA
102 STBI_ONLY_GIF
103 STBI_ONLY_HDR
104 STBI_ONLY_PIC
105 STBI_ONLY_PNM (.ppm and .pgm)
106 
107 Note that you can define multiples of these, and you will get all
108 of them ("only x" and "only y" is interpreted to mean "only x&y").
109 
110 - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111 want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
112 
113 - Compilation of all SIMD code can be suppressed with
114 #define STBI_NO_SIMD
115 It should not be necessary to disable SIMD unless you have issues
116 compiling (e.g. using an x86 compiler which doesn't support SSE
117 intrinsics or that doesn't support the method used to detect
118 SSE2 support at run-time), and even those can be reported as
119 bugs so I can refine the built-in compile-time checking to be
120 smarter.
121 
122 - The old STBI_SIMD system which allowed installing a user-defined
123 IDCT etc. has been removed. If you need this, don't upgrade. My
124 assumption is that almost nobody was doing this, and those who
125 were will find the built-in SIMD more satisfactory anyway.
126 
127 - RGB values computed for JPEG images are slightly different from
128 previous versions of stb_image. (This is due to using less
129 integer precision in SIMD.) The C code has been adjusted so
130 that the same RGB values will be computed regardless of whether
131 SIMD support is available, so your app should always produce
132 consistent results. But these results are slightly different from
133 previous versions. (Specifically, about 3% of available YCbCr values
134 will compute different RGB results from pre-1.49 versions by +-1;
135 most of the deviating values are one smaller in the G channel.)
136 
137 - If you must produce consistent results with previous versions of
138 stb_image, #define STBI_JPEG_OLD and you will get the same results
139 you used to; however, you will not get the SIMD speedups for
140 the YCbCr-to-RGB conversion step (although you should still see
141 significant JPEG speedup from the other changes).
142 
143 Please note that STBI_JPEG_OLD is a temporary feature; it will be
144 removed in future versions of the library. It is only intended for
145 near-term back-compatibility use.
146 
147 
148 Latest revision history:
149 2.13 (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
150 2.12 (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
151 2.11 (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
152 RGB-format JPEG; remove white matting in PSD;
153 allocate large structures on the stack;
154 correct channel count for PNG & BMP
155 2.10 (2016-01-22) avoid warning introduced in 2.09
156 2.09 (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
157 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
158 2.07 (2015-09-13) partial animated GIF support
159 limited 16-bit PSD support
160 minor bugs, code cleanup, and compiler warnings
161 
162 See end of file for full revision history.
163 
164 
165 ============================ Contributors =========================
166 
167 Image formats Extensions, features
168 Sean Barrett (jpeg, png, bmp) Jetro Lauha (stbi_info)
169 Nicolas Schulz (hdr, psd) Martin "SpartanJ" Golini (stbi_info)
170 Jonathan Dummer (tga) James "moose2000" Brown (iPhone PNG)
171 Jean-Marc Lienher (gif) Ben "Disch" Wenger (io callbacks)
172 Tom Seddon (pic) Omar Cornut (1/2/4-bit PNG)
173 Thatcher Ulrich (psd) Nicolas Guillemot (vertical flip)
174 Ken Miller (pgm, ppm) Richard Mitton (16-bit PSD)
175 urraka@github (animated gif) Junggon Kim (PNM comments)
176 Daniel Gibson (16-bit TGA)
177 
178 Optimizations & bugfixes
179 Fabian "ryg" Giesen
180 Arseny Kapoulkine
181 
182 Bug & warning fixes
183 Marc LeBlanc David Woo Guillaume George Martins Mozeiko
184 Christpher Lloyd Martin Golini Jerry Jansson Joseph Thomson
185 Dave Moore Roy Eltham Hayaki Saito Phil Jordan
186 Won Chun Luke Graham Johan Duparc Nathan Reed
187 the Horde3D community Thomas Ruf Ronny Chevalier Nick Verigakis
188 Janez Zemva John Bartholomew Michal Cichon svdijk@github
189 Jonathan Blow Ken Hamada Tero Hanninen Baldur Karlsson
190 Laurent Gomila Cort Stratton Sergio Gonzalez romigrou@github
191 Aruelien Pocheville Thibault Reuille Cass Everitt Matthew Gregan
192 Ryamond Barbiero Paul Du Bois Engin Manap snagar@github
193 Michaelangel007@github Oriol Ferrer Mesia socks-the-fox Zelex@github
194 Philipp Wiesemann Josh Tobin rlyeh@github grim210@github
195 Blazej Dariusz Roszkowski
196 
197 
198 LICENSE
199 
200 This software is dual-licensed to the public domain and under the following
201 license: you are granted a perpetual, irrevocable license to copy, modify,
202 publish, and distribute this file as you see fit.
203 
204 */
205 
206 #ifndef STBI_INCLUDE_STB_IMAGE_H
207 #define STBI_INCLUDE_STB_IMAGE_H
208 
209 // DOCUMENTATION
210 //
211 // Limitations:
212 // - no 16-bit-per-channel PNG
213 // - no 12-bit-per-channel JPEG
214 // - no JPEGs with arithmetic coding
215 // - no 1-bit BMP
216 // - GIF always returns *comp=4
217 //
218 // Basic usage (see HDR discussion below for HDR usage):
219 // int x,y,n;
220 // unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
221 // // ... process data if not NULL ...
222 // // ... x = width, y = height, n = # 8-bit components per pixel ...
223 // // ... replace '0' with '1'..'4' to force that many components per pixel
224 // // ... but 'n' will always be the number that it would have been if you said 0
225 // stbi_image_free(data)
226 //
227 // Standard parameters:
228 // int *x -- outputs image width in pixels
229 // int *y -- outputs image height in pixels
230 // int *channels_in_file -- outputs # of image components in image file
231 // int desired_channels -- if non-zero, # of image components requested in result
232 //
233 // The return value from an image loader is an 'unsigned char *' which points
234 // to the pixel data, or NULL on an allocation failure or if the image is
235 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
236 // with each pixel consisting of N interleaved 8-bit components; the first
237 // pixel pointed to is top-left-most in the image. There is no padding between
238 // image scanlines or between pixels, regardless of format. The number of
239 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
240 // If req_comp is non-zero, *comp has the number of components that _would_
241 // have been output otherwise. E.g. if you set req_comp to 4, you will always
242 // get RGBA output, but you can check *comp to see if it's trivially opaque
243 // because e.g. there were only 3 channels in the source image.
244 //
245 // An output image with N components has the following components interleaved
246 // in this order in each pixel:
247 //
248 // N=#comp components
249 // 1 grey
250 // 2 grey, alpha
251 // 3 red, green, blue
252 // 4 red, green, blue, alpha
253 //
254 // If image loading fails for any reason, the return value will be NULL,
255 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
256 // can be queried for an extremely brief, end-user unfriendly explanation
257 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
258 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
259 // more user-friendly ones.
260 //
261 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
262 //
263 // ===========================================================================
264 //
265 // Philosophy
266 //
267 // stb libraries are designed with the following priorities:
268 //
269 // 1. easy to use
270 // 2. easy to maintain
271 // 3. good performance
272 //
273 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
274 // and for best performance I may provide less-easy-to-use APIs that give higher
275 // performance, in addition to the easy to use ones. Nevertheless, it's important
276 // to keep in mind that from the standpoint of you, a client of this library,
277 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
278 //
279 // Some secondary priorities arise directly from the first two, some of which
280 // make more explicit reasons why performance can't be emphasized.
281 //
282 // - Portable ("ease of use")
283 // - Small footprint ("easy to maintain")
284 // - No dependencies ("ease of use")
285 //
286 // ===========================================================================
287 //
288 // I/O callbacks
289 //
290 // I/O callbacks allow you to read from arbitrary sources, like packaged
291 // files or some other source. Data read from callbacks are processed
292 // through a small internal buffer (currently 128 bytes) to try to reduce
293 // overhead.
294 //
295 // The three functions you must define are "read" (reads some bytes of data),
296 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
297 //
298 // ===========================================================================
299 //
300 // SIMD support
301 //
302 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
303 // supported by the compiler. For ARM Neon support, you must explicitly
304 // request it.
305 //
306 // (The old do-it-yourself SIMD API is no longer supported in the current
307 // code.)
308 //
309 // On x86, SSE2 will automatically be used when available based on a run-time
310 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
311 // the typical path is to have separate builds for NEON and non-NEON devices
312 // (at least this is true for iOS and Android). Therefore, the NEON support is
313 // toggled by a build flag: define STBI_NEON to get NEON loops.
314 //
315 // The output of the JPEG decoder is slightly different from versions where
316 // SIMD support was introduced (that is, for versions before 1.49). The
317 // difference is only +-1 in the 8-bit RGB channels, and only on a small
318 // fraction of pixels. You can force the pre-1.49 behavior by defining
319 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
320 // and hence cost some performance.
321 //
322 // If for some reason you do not want to use any of SIMD code, or if
323 // you have issues compiling it, you can disable it entirely by
324 // defining STBI_NO_SIMD.
325 //
326 // ===========================================================================
327 //
328 // HDR image support (disable by defining STBI_NO_HDR)
329 //
330 // stb_image now supports loading HDR images in general, and currently
331 // the Radiance .HDR file format, although the support is provided
332 // generically. You can still load any file through the existing interface;
333 // if you attempt to load an HDR file, it will be automatically remapped to
334 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
335 // both of these constants can be reconfigured through this interface:
336 //
337 // stbi_hdr_to_ldr_gamma(2.2f);
338 // stbi_hdr_to_ldr_scale(1.0f);
339 //
340 // (note, do not use _inverse_ constants; stbi_image will invert them
341 // appropriately).
342 //
343 // Additionally, there is a new, parallel interface for loading files as
344 // (linear) floats to preserve the full dynamic range:
345 //
346 // float *data = stbi_loadf(filename, &x, &y, &n, 0);
347 //
348 // If you load LDR images through this interface, those images will
349 // be promoted to floating point values, run through the inverse of
350 // constants corresponding to the above:
351 //
352 // stbi_ldr_to_hdr_scale(1.0f);
353 // stbi_ldr_to_hdr_gamma(2.2f);
354 //
355 // Finally, given a filename (or an open file or memory block--see header
356 // file for details) containing image data, you can query for the "most
357 // appropriate" interface to use (that is, whether the image is HDR or
358 // not), using:
359 //
360 // stbi_is_hdr(char *filename);
361 //
362 // ===========================================================================
363 //
364 // iPhone PNG support:
365 //
366 // By default we convert iphone-formatted PNGs back to RGB, even though
367 // they are internally encoded differently. You can disable this conversion
368 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
369 // you will always just get the native iphone "format" through (which
370 // is BGR stored in RGB).
371 //
372 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
373 // pixel to remove any premultiplied alpha *only* if the image file explicitly
374 // says there's premultiplied data (currently only happens in iPhone images,
375 // and only if iPhone convert-to-rgb processing is on).
376 //
377 
378 
379 #ifndef STBI_NO_STDIO
380 #include <stdio.h>
381 #endif // STBI_NO_STDIO
382 
383 #define STBI_VERSION 1
384 
385 enum
386 {
387  STBI_default = 0, // only used for req_comp
388 
391  STBI_rgb = 3,
393 };
394 
395 typedef unsigned char stbi_uc;
396 typedef unsigned short stbi_us;
397 
398 #ifdef __cplusplus
399 extern "C" {
400 #endif
401 
402 #ifdef STB_IMAGE_STATIC
403 #define STBIDEF static
404 #else
405 #define STBIDEF extern
406 #endif
407 
409  //
410  // PRIMARY API - works on images of any type
411  //
412 
413  //
414  // load image by filename, open file, or memory buffer
415  //
416 
417  typedef struct
418  {
419  int(*read) (void *user, char *data, int size); // fill 'data' with 'size' bytes. return number of bytes actually read
420  void(*skip) (void *user, int n); // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
421  int(*eof) (void *user); // returns nonzero if we are at end of file/data
423 
425  //
426  // 8-bits-per-channel interface
427  //
428 
429  STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
430  STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
431  STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
432 
433 #ifndef STBI_NO_STDIO
434  STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
435  // for stbi_load_from_file, file pointer is left pointing immediately after image
436 #endif
437 
439  //
440  // 16-bits-per-channel interface
441  //
442 
443  STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
444 #ifndef STBI_NO_STDIO
445  STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
446 #endif
447  // @TODO the other variants
448 
450  //
451  // float-per-channel interface
452  //
453 #ifndef STBI_NO_LINEAR
454  STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
455  STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
456  STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
457 
458 #ifndef STBI_NO_STDIO
459  STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
460 #endif
461 #endif
462 
463 #ifndef STBI_NO_HDR
464  STBIDEF void stbi_hdr_to_ldr_gamma(float gamma);
465  STBIDEF void stbi_hdr_to_ldr_scale(float scale);
466 #endif // STBI_NO_HDR
467 
468 #ifndef STBI_NO_LINEAR
469  STBIDEF void stbi_ldr_to_hdr_gamma(float gamma);
470  STBIDEF void stbi_ldr_to_hdr_scale(float scale);
471 #endif // STBI_NO_LINEAR
472 
473  // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
474  STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
475  STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
476 #ifndef STBI_NO_STDIO
477  STBIDEF int stbi_is_hdr(char const *filename);
478  STBIDEF int stbi_is_hdr_from_file(FILE *f);
479 #endif // STBI_NO_STDIO
480 
481 
482  // get a VERY brief reason for failure
483  // NOT THREADSAFE
484  STBIDEF const char *stbi_failure_reason(void);
485 
486  // free the loaded image -- this is just free()
487  STBIDEF void stbi_image_free(void *retval_from_stbi_load);
488 
489  // get image dimensions & components without fully decoding
490  STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
491  STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
492 
493 #ifndef STBI_NO_STDIO
494  STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp);
495  STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp);
496 
497 #endif
498 
499 
500 
501  // for image formats that explicitly notate that they have premultiplied alpha,
502  // we just return the colors as stored in the file. set this flag to force
503  // unpremultiplication. results are undefined if the unpremultiply overflow.
504  STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
505 
506  // indicate whether we should process iphone images back to canonical format,
507  // or just pass them through "as-is"
508  STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
509 
510  // flip the image vertically, so the first pixel in the output array is the bottom left
511  STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
512 
513  // ZLIB client - used by PNG, available for other purposes
514 
515  STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
516  STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
517  STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
518  STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
519 
520  STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
521  STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
522 
523 
524 #ifdef __cplusplus
525 }
526 #endif
527 
528 //
529 //
531 #endif // STBI_INCLUDE_STB_IMAGE_H
532 
533 #ifdef STB_IMAGE_IMPLEMENTATION
534 
535 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
536  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
537  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
538  || defined(STBI_ONLY_ZLIB)
539 #ifndef STBI_ONLY_JPEG
540 #define STBI_NO_JPEG
541 #endif
542 #ifndef STBI_ONLY_PNG
543 #define STBI_NO_PNG
544 #endif
545 #ifndef STBI_ONLY_BMP
546 #define STBI_NO_BMP
547 #endif
548 #ifndef STBI_ONLY_PSD
549 #define STBI_NO_PSD
550 #endif
551 #ifndef STBI_ONLY_TGA
552 #define STBI_NO_TGA
553 #endif
554 #ifndef STBI_ONLY_GIF
555 #define STBI_NO_GIF
556 #endif
557 #ifndef STBI_ONLY_HDR
558 #define STBI_NO_HDR
559 #endif
560 #ifndef STBI_ONLY_PIC
561 #define STBI_NO_PIC
562 #endif
563 #ifndef STBI_ONLY_PNM
564 #define STBI_NO_PNM
565 #endif
566 #endif
567 
568 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
569 #define STBI_NO_ZLIB
570 #endif
571 
572 
573 #include <stdarg.h>
574 #include <stddef.h> // ptrdiff_t on osx
575 #include <stdlib.h>
576 #include <string.h>
577 #include <limits.h>
578 
579 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
580 #include <math.h> // ldexp
581 #endif
582 
583 #ifndef STBI_NO_STDIO
584 #include <stdio.h>
585 #endif
586 
587 #ifndef STBI_ASSERT
588 #include <assert.h>
589 #define STBI_ASSERT(x) assert(x)
590 #endif
591 
592 
593 #ifndef _MSC_VER
594 #ifdef __cplusplus
595 #define stbi_inline inline
596 #else
597 #define stbi_inline
598 #endif
599 #else
600 #define stbi_inline __forceinline
601 #endif
602 
603 
604 #ifdef _MSC_VER
605 typedef unsigned short stbi__uint16;
606 typedef signed short stbi__int16;
607 typedef unsigned int stbi__uint32;
608 typedef signed int stbi__int32;
609 #else
610 #include <stdint.h>
611 typedef uint16_t stbi__uint16;
612 typedef int16_t stbi__int16;
613 typedef uint32_t stbi__uint32;
614 typedef int32_t stbi__int32;
615 #endif
616 
617 // should produce compiler error if size is wrong
618 typedef unsigned char validate_uint32[sizeof(stbi__uint32) == 4 ? 1 : -1];
619 
620 #ifdef _MSC_VER
621 #define STBI_NOTUSED(v) (void)(v)
622 #else
623 #define STBI_NOTUSED(v) (void)sizeof(v)
624 #endif
625 
626 #ifdef _MSC_VER
627 #define STBI_HAS_LROTL
628 #endif
629 
630 #ifdef STBI_HAS_LROTL
631 #define stbi_lrot(x,y) _lrotl(x,y)
632 #else
633 #define stbi_lrot(x,y) (((x) << (y)) | ((x) >> (32 - (y))))
634 #endif
635 
636 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
637 // ok
638 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
639 // ok
640 #else
641 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
642 #endif
643 
644 #ifndef STBI_MALLOC
645 #define STBI_MALLOC(sz) malloc(sz)
646 #define STBI_REALLOC(p,newsz) realloc(p,newsz)
647 #define STBI_FREE(p) free(p)
648 #endif
649 
650 #ifndef STBI_REALLOC_SIZED
651 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
652 #endif
653 
654 // x86/x64 detection
655 #if defined(__x86_64__) || defined(_M_X64)
656 #define STBI__X64_TARGET
657 #elif defined(__i386) || defined(_M_IX86)
658 #define STBI__X86_TARGET
659 #endif
660 
661 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
662 // NOTE: not clear do we actually need this for the 64-bit path?
663 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
664 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
665 // this is just broken and gcc are jerks for not fixing it properly
666 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
667 #define STBI_NO_SIMD
668 #endif
669 
670 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
671 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
672 //
673 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
674 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
675 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
676 // simultaneously enabling "-mstackrealign".
677 //
678 // See https://github.com/nothings/stb/issues/81 for more information.
679 //
680 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
681 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
682 #define STBI_NO_SIMD
683 #endif
684 
685 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
686 #define STBI_SSE2
687 #include <emmintrin.h>
688 
689 #ifdef _MSC_VER
690 
691 #if _MSC_VER >= 1400 // not VC6
692 #include <intrin.h> // __cpuid
693 static int stbi__cpuid3(void)
694 {
695  int info[4];
696  __cpuid(info, 1);
697  return info[3];
698 }
699 #else
700 static int stbi__cpuid3(void)
701 {
702  int res;
703  __asm {
704  mov eax, 1
705  cpuid
706  mov res, edx
707  }
708  return res;
709 }
710 #endif
711 
712 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
713 
714 static int stbi__sse2_available()
715 {
716  int info3 = stbi__cpuid3();
717  return ((info3 >> 26) & 1) != 0;
718 }
719 #else // assume GCC-style if not VC++
720 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
721 
722 static int stbi__sse2_available()
723 {
724 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
725  // GCC 4.8+ has a nice way to do this
726  return __builtin_cpu_supports("sse2");
727 #else
728  // portable way to do this, preferably without using GCC inline ASM?
729  // just bail for now.
730  return 0;
731 #endif
732 }
733 #endif
734 #endif
735 
736 // ARM NEON
737 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
738 #undef STBI_NEON
739 #endif
740 
741 #ifdef STBI_NEON
742 #include <arm_neon.h>
743 // assume GCC or Clang on ARM targets
744 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
745 #endif
746 
747 #ifndef STBI_SIMD_ALIGN
748 #define STBI_SIMD_ALIGN(type, name) type name
749 #endif
750 
752 //
753 // stbi__context struct and start_xxx functions
754 
755 // stbi__context structure is our basic context used by all images, so it
756 // contains all the IO context, plus some basic image information
757 typedef struct
758 {
759  stbi__uint32 img_x, img_y;
760  int img_n, img_out_n;
761 
763  void *io_user_data;
764 
765  int read_from_callbacks;
766  int buflen;
767  stbi_uc buffer_start[128];
768 
769  stbi_uc *img_buffer, *img_buffer_end;
770  stbi_uc *img_buffer_original, *img_buffer_original_end;
771 } stbi__context;
772 
773 
774 static void stbi__refill_buffer(stbi__context *s);
775 
776 // initialize a memory-decode context
777 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
778 {
779  s->io.read = NULL;
780  s->read_from_callbacks = 0;
781  s->img_buffer = s->img_buffer_original = (stbi_uc *)buffer;
782  s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *)buffer + len;
783 }
784 
785 // initialize a callback-based context
786 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
787 {
788  s->io = *c;
789  s->io_user_data = user;
790  s->buflen = sizeof(s->buffer_start);
791  s->read_from_callbacks = 1;
792  s->img_buffer_original = s->buffer_start;
793  stbi__refill_buffer(s);
794  s->img_buffer_original_end = s->img_buffer_end;
795 }
796 
797 #ifndef STBI_NO_STDIO
798 
799 static int stbi__stdio_read(void *user, char *data, int size)
800 {
801  return (int)fread(data, 1, size, (FILE*)user);
802 }
803 
804 static void stbi__stdio_skip(void *user, int n)
805 {
806  fseek((FILE*)user, n, SEEK_CUR);
807 }
808 
809 static int stbi__stdio_eof(void *user)
810 {
811  return feof((FILE*)user);
812 }
813 
814 static stbi_io_callbacks stbi__stdio_callbacks =
815 {
816  stbi__stdio_read,
817  stbi__stdio_skip,
818  stbi__stdio_eof,
819 };
820 
821 static void stbi__start_file(stbi__context *s, FILE *f)
822 {
823  stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *)f);
824 }
825 
826 //static void stop_file(stbi__context *s) { }
827 
828 #endif // !STBI_NO_STDIO
829 
830 static void stbi__rewind(stbi__context *s)
831 {
832  // conceptually rewind SHOULD rewind to the beginning of the stream,
833  // but we just rewind to the beginning of the initial buffer, because
834  // we only use it after doing 'test', which only ever looks at at most 92 bytes
835  s->img_buffer = s->img_buffer_original;
836  s->img_buffer_end = s->img_buffer_original_end;
837 }
838 
839 enum
840 {
841  STBI_ORDER_RGB,
842  STBI_ORDER_BGR
843 };
844 
845 typedef struct
846 {
847  int bits_per_channel;
848  int num_channels;
849  int channel_order;
850 } stbi__result_info;
851 
852 #ifndef STBI_NO_JPEG
853 static int stbi__jpeg_test(stbi__context *s);
854 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
855 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
856 #endif
857 
858 #ifndef STBI_NO_PNG
859 static int stbi__png_test(stbi__context *s);
860 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
861 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
862 #endif
863 
864 #ifndef STBI_NO_BMP
865 static int stbi__bmp_test(stbi__context *s);
866 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
867 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
868 #endif
869 
870 #ifndef STBI_NO_TGA
871 static int stbi__tga_test(stbi__context *s);
872 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
873 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
874 #endif
875 
876 #ifndef STBI_NO_PSD
877 static int stbi__psd_test(stbi__context *s);
878 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
879 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
880 #endif
881 
882 #ifndef STBI_NO_HDR
883 static int stbi__hdr_test(stbi__context *s);
884 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
885 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
886 #endif
887 
888 #ifndef STBI_NO_PIC
889 static int stbi__pic_test(stbi__context *s);
890 static void *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
891 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
892 #endif
893 
894 #ifndef STBI_NO_GIF
895 static int stbi__gif_test(stbi__context *s);
896 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
897 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
898 #endif
899 
900 #ifndef STBI_NO_PNM
901 static int stbi__pnm_test(stbi__context *s);
902 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
903 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
904 #endif
905 
906 // this is not threadsafe
907 static const char *stbi__g_failure_reason;
908 
909 STBIDEF const char *stbi_failure_reason(void)
910 {
911  return stbi__g_failure_reason;
912 }
913 
914 static int stbi__err(const char *str)
915 {
916  stbi__g_failure_reason = str;
917  return 0;
918 }
919 
920 static void *stbi__malloc(size_t size)
921 {
922  return STBI_MALLOC(size);
923 }
924 
925 // stb_image uses ints pervasively, including for offset calculations.
926 // therefore the largest decoded image size we can support with the
927 // current code, even on 64-bit targets, is INT_MAX. this is not a
928 // significant limitation for the intended use case.
929 //
930 // we do, however, need to make sure our size calculations don't
931 // overflow. hence a few helper functions for size calculations that
932 // multiply integers together, making sure that they're non-negative
933 // and no overflow occurs.
934 
935 // return 1 if the sum is valid, 0 on overflow.
936 // negative terms are considered invalid.
937 static int stbi__addsizes_valid(int a, int b)
938 {
939  if (b < 0) return 0;
940  // now 0 <= b <= INT_MAX, hence also
941  // 0 <= INT_MAX - b <= INTMAX.
942  // And "a + b <= INT_MAX" (which might overflow) is the
943  // same as a <= INT_MAX - b (no overflow)
944  return a <= INT_MAX - b;
945 }
946 
947 // returns 1 if the product is valid, 0 on overflow.
948 // negative factors are considered invalid.
949 static int stbi__mul2sizes_valid(int a, int b)
950 {
951  if (a < 0 || b < 0) return 0;
952  if (b == 0) return 1; // mul-by-0 is always safe
953  // portable way to check for no overflows in a*b
954  return a <= INT_MAX / b;
955 }
956 
957 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
958 static int stbi__mad2sizes_valid(int a, int b, int add)
959 {
960  return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
961 }
962 
963 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
964 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
965 {
966  return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
967  stbi__addsizes_valid(a*b*c, add);
968 }
969 
970 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
971 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
972 {
973  return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
974  stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
975 }
976 
977 // mallocs with size overflow checking
978 static void *stbi__malloc_mad2(int a, int b, int add)
979 {
980  if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
981  return stbi__malloc(a*b + add);
982 }
983 
984 static void *stbi__malloc_mad3(int a, int b, int c, int add)
985 {
986  if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
987  return stbi__malloc(a*b*c + add);
988 }
989 
990 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
991 {
992  if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
993  return stbi__malloc(a*b*c*d + add);
994 }
995 
996 // stbi__err - error
997 // stbi__errpf - error returning pointer to float
998 // stbi__errpuc - error returning pointer to unsigned char
999 
1000 #ifdef STBI_NO_FAILURE_STRINGS
1001 #define stbi__err(x,y) 0
1002 #elif defined(STBI_FAILURE_USERMSG)
1003 #define stbi__err(x,y) stbi__err(y)
1004 #else
1005 #define stbi__err(x,y) stbi__err(x)
1006 #endif
1007 
1008 #define stbi__errpf(x,y) ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
1009 #define stbi__errpuc(x,y) ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
1010 
1011 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
1012 {
1013  STBI_FREE(retval_from_stbi_load);
1014 }
1015 
1016 #ifndef STBI_NO_LINEAR
1017 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1018 #endif
1019 
1020 #ifndef STBI_NO_HDR
1021 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp);
1022 #endif
1023 
1024 #ifdef VULKAN
1025 static int stbi__vertically_flip_on_load = 0;
1026 #else
1027 static int stbi__vertically_flip_on_load = 1;
1028 #endif
1029 
1030 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1031 {
1032  stbi__vertically_flip_on_load = flag_true_if_should_flip;
1033 }
1034 
1035 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
1036 {
1037  memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
1038  ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
1039  ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
1040  ri->num_channels = 0;
1041 
1042 #ifndef STBI_NO_JPEG
1043  if (stbi__jpeg_test(s)) return stbi__jpeg_load(s, x, y, comp, req_comp, ri);
1044 #endif
1045 #ifndef STBI_NO_PNG
1046  if (stbi__png_test(s)) return stbi__png_load(s, x, y, comp, req_comp, ri);
1047 #endif
1048 #ifndef STBI_NO_BMP
1049  if (stbi__bmp_test(s)) return stbi__bmp_load(s, x, y, comp, req_comp, ri);
1050 #endif
1051 #ifndef STBI_NO_GIF
1052  if (stbi__gif_test(s)) return stbi__gif_load(s, x, y, comp, req_comp, ri);
1053 #endif
1054 #ifndef STBI_NO_PSD
1055  if (stbi__psd_test(s)) return stbi__psd_load(s, x, y, comp, req_comp, ri, bpc);
1056 #endif
1057 #ifndef STBI_NO_PIC
1058  if (stbi__pic_test(s)) return stbi__pic_load(s, x, y, comp, req_comp, ri);
1059 #endif
1060 #ifndef STBI_NO_PNM
1061  if (stbi__pnm_test(s)) return stbi__pnm_load(s, x, y, comp, req_comp, ri);
1062 #endif
1063 
1064 #ifndef STBI_NO_HDR
1065  if (stbi__hdr_test(s)) {
1066  float *hdr = stbi__hdr_load(s, x, y, comp, req_comp, ri);
1067  return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1068  }
1069 #endif
1070 
1071 #ifndef STBI_NO_TGA
1072  // test tga last because it's a crappy test!
1073  if (stbi__tga_test(s))
1074  return stbi__tga_load(s, x, y, comp, req_comp, ri);
1075 #endif
1076 
1077  return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1078 }
1079 
1080 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1081 {
1082  int i;
1083  int img_len = w * h * channels;
1084  stbi_uc *reduced;
1085 
1086  reduced = (stbi_uc *)stbi__malloc(img_len);
1087  if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
1088 
1089  for (i = 0; i < img_len; ++i)
1090  reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
1091 
1092  STBI_FREE(orig);
1093  return reduced;
1094 }
1095 
1096 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1097 {
1098  int i;
1099  int img_len = w * h * channels;
1100  stbi__uint16 *enlarged;
1101 
1102  enlarged = (stbi__uint16 *)stbi__malloc(img_len * 2);
1103  if (enlarged == NULL) return (stbi__uint16 *)stbi__errpuc("outofmem", "Out of memory");
1104 
1105  for (i = 0; i < img_len; ++i)
1106  enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
1107 
1108  STBI_FREE(orig);
1109  return enlarged;
1110 }
1111 
1112 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1113 {
1114  stbi__result_info ri;
1115  void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1116 
1117  if (result == NULL)
1118  return NULL;
1119 
1120  if (ri.bits_per_channel != 8) {
1121  STBI_ASSERT(ri.bits_per_channel == 16);
1122  result = stbi__convert_16_to_8((stbi__uint16 *)result, *x, *y, req_comp == 0 ? *comp : req_comp);
1123  ri.bits_per_channel = 8;
1124  }
1125 
1126  // @TODO: move stbi__convert_format to here
1127 
1128  if (stbi__vertically_flip_on_load) {
1129  int w = *x, h = *y;
1130  int channels = req_comp ? req_comp : *comp;
1131  int row, col, z;
1132  stbi_uc *image = (stbi_uc *)result;
1133 
1134  // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1135  for (row = 0; row < (h >> 1); row++) {
1136  for (col = 0; col < w; col++) {
1137  for (z = 0; z < channels; z++) {
1138  stbi_uc temp = image[(row * w + col) * channels + z];
1139  image[(row * w + col) * channels + z] = image[((h - row - 1) * w + col) * channels + z];
1140  image[((h - row - 1) * w + col) * channels + z] = temp;
1141  }
1142  }
1143  }
1144  }
1145 
1146  return (unsigned char *)result;
1147 }
1148 
1149 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1150 {
1151  stbi__result_info ri;
1152  void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1153 
1154  if (result == NULL)
1155  return NULL;
1156 
1157  if (ri.bits_per_channel != 16) {
1158  STBI_ASSERT(ri.bits_per_channel == 8);
1159  result = stbi__convert_8_to_16((stbi_uc *)result, *x, *y, req_comp == 0 ? *comp : req_comp);
1160  ri.bits_per_channel = 16;
1161  }
1162 
1163  // @TODO: move stbi__convert_format16 to here
1164  // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
1165 
1166  if (stbi__vertically_flip_on_load) {
1167  int w = *x, h = *y;
1168  int channels = req_comp ? req_comp : *comp;
1169  int row, col, z;
1170  stbi__uint16 *image = (stbi__uint16 *)result;
1171 
1172  // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1173  for (row = 0; row < (h >> 1); row++) {
1174  for (col = 0; col < w; col++) {
1175  for (z = 0; z < channels; z++) {
1176  stbi__uint16 temp = image[(row * w + col) * channels + z];
1177  image[(row * w + col) * channels + z] = image[((h - row - 1) * w + col) * channels + z];
1178  image[((h - row - 1) * w + col) * channels + z] = temp;
1179  }
1180  }
1181  }
1182  }
1183 
1184  return (stbi__uint16 *)result;
1185 }
1186 
1187 #ifndef STBI_NO_HDR
1188 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1189 {
1190  if (stbi__vertically_flip_on_load && result != NULL) {
1191  int w = *x, h = *y;
1192  int depth = req_comp ? req_comp : *comp;
1193  int row, col, z;
1194  float temp;
1195 
1196  // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1197  for (row = 0; row < (h >> 1); row++) {
1198  for (col = 0; col < w; col++) {
1199  for (z = 0; z < depth; z++) {
1200  temp = result[(row * w + col) * depth + z];
1201  result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1202  result[((h - row - 1) * w + col) * depth + z] = temp;
1203  }
1204  }
1205  }
1206  }
1207 }
1208 #endif
1209 
1210 #ifndef STBI_NO_STDIO
1211 
1212 static FILE *stbi__fopen(char const *filename, char const *mode)
1213 {
1214  FILE *f;
1215 #if defined(_MSC_VER) && _MSC_VER >= 1400
1216  if (0 != fopen_s(&f, filename, mode))
1217  f = 0;
1218 #else
1219  f = fopen(filename, mode);
1220 #endif
1221  return f;
1222 }
1223 
1224 
1225 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1226 {
1227  FILE *f = stbi__fopen(filename, "rb");
1228  unsigned char *result;
1229  if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1230  result = stbi_load_from_file(f, x, y, comp, req_comp);
1231  fclose(f);
1232  return result;
1233 }
1234 
1235 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1236 {
1237  unsigned char *result;
1238  stbi__context s;
1239  stbi__start_file(&s, f);
1240  result = stbi__load_and_postprocess_8bit(&s, x, y, comp, req_comp);
1241  if (result) {
1242  // need to 'unget' all the characters in the IO buffer
1243  fseek(f, -(int)(s.img_buffer_end - s.img_buffer), SEEK_CUR);
1244  }
1245  return result;
1246 }
1247 
1248 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1249 {
1250  stbi__uint16 *result;
1251  stbi__context s;
1252  stbi__start_file(&s, f);
1253  result = stbi__load_and_postprocess_16bit(&s, x, y, comp, req_comp);
1254  if (result) {
1255  // need to 'unget' all the characters in the IO buffer
1256  fseek(f, -(int)(s.img_buffer_end - s.img_buffer), SEEK_CUR);
1257  }
1258  return result;
1259 }
1260 
1261 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1262 {
1263  FILE *f = stbi__fopen(filename, "rb");
1264  stbi__uint16 *result;
1265  if (!f) return (stbi_us *)stbi__errpuc("can't fopen", "Unable to open file");
1266  result = stbi_load_from_file_16(f, x, y, comp, req_comp);
1267  fclose(f);
1268  return result;
1269 }
1270 
1271 
1272 #endif
1273 
1274 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1275 {
1276  stbi__context s;
1277  stbi__start_mem(&s, buffer, len);
1278  return stbi__load_and_postprocess_8bit(&s, x, y, comp, req_comp);
1279 }
1280 
1281 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1282 {
1283  stbi__context s;
1284  stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1285  return stbi__load_and_postprocess_8bit(&s, x, y, comp, req_comp);
1286 }
1287 
1288 #ifndef STBI_NO_LINEAR
1289 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1290 {
1291  unsigned char *data;
1292 #ifndef STBI_NO_HDR
1293  if (stbi__hdr_test(s)) {
1294  stbi__result_info ri;
1295  float *hdr_data = stbi__hdr_load(s, x, y, comp, req_comp, &ri);
1296  if (hdr_data)
1297  stbi__float_postprocess(hdr_data, x, y, comp, req_comp);
1298  return hdr_data;
1299  }
1300 #endif
1301  data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1302  if (data)
1303  return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1304  return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1305 }
1306 
1307 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1308 {
1309  stbi__context s;
1310  stbi__start_mem(&s, buffer, len);
1311  return stbi__loadf_main(&s, x, y, comp, req_comp);
1312 }
1313 
1314 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1315 {
1316  stbi__context s;
1317  stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1318  return stbi__loadf_main(&s, x, y, comp, req_comp);
1319 }
1320 
1321 #ifndef STBI_NO_STDIO
1322 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1323 {
1324  float *result;
1325  FILE *f = stbi__fopen(filename, "rb");
1326  if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1327  result = stbi_loadf_from_file(f, x, y, comp, req_comp);
1328  fclose(f);
1329  return result;
1330 }
1331 
1332 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1333 {
1334  stbi__context s;
1335  stbi__start_file(&s, f);
1336  return stbi__loadf_main(&s, x, y, comp, req_comp);
1337 }
1338 #endif // !STBI_NO_STDIO
1339 
1340 #endif // !STBI_NO_LINEAR
1341 
1342 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1343 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1344 // reports false!
1345 
1346 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1347 {
1348 #ifndef STBI_NO_HDR
1349  stbi__context s;
1350  stbi__start_mem(&s, buffer, len);
1351  return stbi__hdr_test(&s);
1352 #else
1353  STBI_NOTUSED(buffer);
1354  STBI_NOTUSED(len);
1355  return 0;
1356 #endif
1357 }
1358 
1359 #ifndef STBI_NO_STDIO
1360 STBIDEF int stbi_is_hdr(char const *filename)
1361 {
1362  FILE *f = stbi__fopen(filename, "rb");
1363  int result = 0;
1364  if (f) {
1365  result = stbi_is_hdr_from_file(f);
1366  fclose(f);
1367  }
1368  return result;
1369 }
1370 
1371 STBIDEF int stbi_is_hdr_from_file(FILE *f)
1372 {
1373 #ifndef STBI_NO_HDR
1374  stbi__context s;
1375  stbi__start_file(&s, f);
1376  return stbi__hdr_test(&s);
1377 #else
1378  STBI_NOTUSED(f);
1379  return 0;
1380 #endif
1381 }
1382 #endif // !STBI_NO_STDIO
1383 
1384 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1385 {
1386 #ifndef STBI_NO_HDR
1387  stbi__context s;
1388  stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1389  return stbi__hdr_test(&s);
1390 #else
1391  STBI_NOTUSED(clbk);
1392  STBI_NOTUSED(user);
1393  return 0;
1394 #endif
1395 }
1396 
1397 #ifndef STBI_NO_LINEAR
1398 static float stbi__l2h_gamma = 2.2f, stbi__l2h_scale = 1.0f;
1399 
1400 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1401 STBIDEF void stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1402 #endif
1403 
1404 static float stbi__h2l_gamma_i = 1.0f / 2.2f, stbi__h2l_scale_i = 1.0f;
1405 
1406 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1 / gamma; }
1407 STBIDEF void stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1 / scale; }
1408 
1409 
1411 //
1412 // Common code used by all image loaders
1413 //
1414 
1415 enum
1416 {
1417  STBI__SCAN_load = 0,
1418  STBI__SCAN_type,
1419  STBI__SCAN_header
1420 };
1421 
1422 static void stbi__refill_buffer(stbi__context *s)
1423 {
1424  int n = (s->io.read)(s->io_user_data, (char*)s->buffer_start, s->buflen);
1425  if (n == 0) {
1426  // at end of file, treat same as if from memory, but need to handle case
1427  // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1428  s->read_from_callbacks = 0;
1429  s->img_buffer = s->buffer_start;
1430  s->img_buffer_end = s->buffer_start + 1;
1431  *s->img_buffer = 0;
1432  }
1433  else {
1434  s->img_buffer = s->buffer_start;
1435  s->img_buffer_end = s->buffer_start + n;
1436  }
1437 }
1438 
1439 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1440 {
1441  if (s->img_buffer < s->img_buffer_end)
1442  return *s->img_buffer++;
1443  if (s->read_from_callbacks) {
1444  stbi__refill_buffer(s);
1445  return *s->img_buffer++;
1446  }
1447  return 0;
1448 }
1449 
1450 stbi_inline static int stbi__at_eof(stbi__context *s)
1451 {
1452  if (s->io.read) {
1453  if (!(s->io.eof)(s->io_user_data)) return 0;
1454  // if feof() is true, check if buffer = end
1455  // special case: we've only got the special 0 character at the end
1456  if (s->read_from_callbacks == 0) return 1;
1457  }
1458 
1459  return s->img_buffer >= s->img_buffer_end;
1460 }
1461 
1462 static void stbi__skip(stbi__context *s, int n)
1463 {
1464  if (n < 0) {
1465  s->img_buffer = s->img_buffer_end;
1466  return;
1467  }
1468  if (s->io.read) {
1469  int blen = (int)(s->img_buffer_end - s->img_buffer);
1470  if (blen < n) {
1471  s->img_buffer = s->img_buffer_end;
1472  (s->io.skip)(s->io_user_data, n - blen);
1473  return;
1474  }
1475  }
1476  s->img_buffer += n;
1477 }
1478 
1479 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1480 {
1481  if (s->io.read) {
1482  int blen = (int)(s->img_buffer_end - s->img_buffer);
1483  if (blen < n) {
1484  int res, count;
1485 
1486  memcpy(buffer, s->img_buffer, blen);
1487 
1488  count = (s->io.read)(s->io_user_data, (char*)buffer + blen, n - blen);
1489  res = (count == (n - blen));
1490  s->img_buffer = s->img_buffer_end;
1491  return res;
1492  }
1493  }
1494 
1495  if (s->img_buffer + n <= s->img_buffer_end) {
1496  memcpy(buffer, s->img_buffer, n);
1497  s->img_buffer += n;
1498  return 1;
1499  }
1500  else
1501  return 0;
1502 }
1503 
1504 static int stbi__get16be(stbi__context *s)
1505 {
1506  int z = stbi__get8(s);
1507  return (z << 8) + stbi__get8(s);
1508 }
1509 
1510 static stbi__uint32 stbi__get32be(stbi__context *s)
1511 {
1512  stbi__uint32 z = stbi__get16be(s);
1513  return (z << 16) + stbi__get16be(s);
1514 }
1515 
1516 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1517 // nothing
1518 #else
1519 static int stbi__get16le(stbi__context *s)
1520 {
1521  int z = stbi__get8(s);
1522  return z + (stbi__get8(s) << 8);
1523 }
1524 #endif
1525 
1526 #ifndef STBI_NO_BMP
1527 static stbi__uint32 stbi__get32le(stbi__context *s)
1528 {
1529  stbi__uint32 z = stbi__get16le(s);
1530  return z + (stbi__get16le(s) << 16);
1531 }
1532 #endif
1533 
1534 #define STBI__BYTECAST(x) ((stbi_uc) ((x) & 255)) // truncate int to byte without warnings
1535 
1536 
1538 //
1539 // generic converter from built-in img_n to req_comp
1540 // individual types do this automatically as much as possible (e.g. jpeg
1541 // does all cases internally since it needs to colorspace convert anyway,
1542 // and it never has alpha, so very few cases ). png can automatically
1543 // interleave an alpha=255 channel, but falls back to this for other cases
1544 //
1545 // assume data buffer is malloced, so malloc a new one and free that one
1546 // only failure mode is malloc failing
1547 
1548 static stbi_uc stbi__compute_y(int r, int g, int b)
1549 {
1550  return (stbi_uc)(((r * 77) + (g * 150) + (29 * b)) >> 8);
1551 }
1552 
1553 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1554 {
1555  int i, j;
1556  unsigned char *good;
1557 
1558  if (req_comp == img_n) return data;
1559  STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1560 
1561  good = (unsigned char *)stbi__malloc_mad3(req_comp, x, y, 0);
1562  if (good == NULL) {
1563  STBI_FREE(data);
1564  return stbi__errpuc("outofmem", "Out of memory");
1565  }
1566 
1567  for (j = 0; j < (int)y; ++j) {
1568  unsigned char *src = data + j * x * img_n;
1569  unsigned char *dest = good + j * x * req_comp;
1570 
1571 #define STBI__COMBO(a,b) ((a)*8+(b))
1572 #define STBI__CASE(a,b) case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1573  // convert source image with img_n components to one with req_comp components;
1574  // avoid switch per pixel, so use switch per scanline and massive macros
1575  switch (STBI__COMBO(img_n, req_comp)) {
1576  STBI__CASE(1, 2) { dest[0] = src[0], dest[1] = 255; } break;
1577  STBI__CASE(1, 3) { dest[0] = dest[1] = dest[2] = src[0]; } break;
1578  STBI__CASE(1, 4) { dest[0] = dest[1] = dest[2] = src[0], dest[3] = 255; } break;
1579  STBI__CASE(2, 1) { dest[0] = src[0]; } break;
1580  STBI__CASE(2, 3) { dest[0] = dest[1] = dest[2] = src[0]; } break;
1581  STBI__CASE(2, 4) { dest[0] = dest[1] = dest[2] = src[0], dest[3] = src[1]; } break;
1582  STBI__CASE(3, 4) { dest[0] = src[0], dest[1] = src[1], dest[2] = src[2], dest[3] = 255; } break;
1583  STBI__CASE(3, 1) { dest[0] = stbi__compute_y(src[0], src[1], src[2]); } break;
1584  STBI__CASE(3, 2) { dest[0] = stbi__compute_y(src[0], src[1], src[2]), dest[1] = 255; } break;
1585  STBI__CASE(4, 1) { dest[0] = stbi__compute_y(src[0], src[1], src[2]); } break;
1586  STBI__CASE(4, 2) { dest[0] = stbi__compute_y(src[0], src[1], src[2]), dest[1] = src[3]; } break;
1587  STBI__CASE(4, 3) { dest[0] = src[0], dest[1] = src[1], dest[2] = src[2]; } break;
1588  default: STBI_ASSERT(0);
1589  }
1590 #undef STBI__CASE
1591  }
1592 
1593  STBI_FREE(data);
1594  return good;
1595 }
1596 
1597 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
1598 {
1599  return (stbi__uint16)(((r * 77) + (g * 150) + (29 * b)) >> 8);
1600 }
1601 
1602 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1603 {
1604  int i, j;
1605  stbi__uint16 *good;
1606 
1607  if (req_comp == img_n) return data;
1608  STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1609 
1610  good = (stbi__uint16 *)stbi__malloc(req_comp * x * y * 2);
1611  if (good == NULL) {
1612  STBI_FREE(data);
1613  return (stbi__uint16 *)stbi__errpuc("outofmem", "Out of memory");
1614  }
1615 
1616  for (j = 0; j < (int)y; ++j) {
1617  stbi__uint16 *src = data + j * x * img_n;
1618  stbi__uint16 *dest = good + j * x * req_comp;
1619 
1620 #define STBI__COMBO(a,b) ((a)*8+(b))
1621 #define STBI__CASE(a,b) case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1622  // convert source image with img_n components to one with req_comp components;
1623  // avoid switch per pixel, so use switch per scanline and massive macros
1624  switch (STBI__COMBO(img_n, req_comp)) {
1625  STBI__CASE(1, 2) { dest[0] = src[0], dest[1] = 0xffff; } break;
1626  STBI__CASE(1, 3) { dest[0] = dest[1] = dest[2] = src[0]; } break;
1627  STBI__CASE(1, 4) { dest[0] = dest[1] = dest[2] = src[0], dest[3] = 0xffff; } break;
1628  STBI__CASE(2, 1) { dest[0] = src[0]; } break;
1629  STBI__CASE(2, 3) { dest[0] = dest[1] = dest[2] = src[0]; } break;
1630  STBI__CASE(2, 4) { dest[0] = dest[1] = dest[2] = src[0], dest[3] = src[1]; } break;
1631  STBI__CASE(3, 4) { dest[0] = src[0], dest[1] = src[1], dest[2] = src[2], dest[3] = 0xffff; } break;
1632  STBI__CASE(3, 1) { dest[0] = stbi__compute_y_16(src[0], src[1], src[2]); } break;
1633  STBI__CASE(3, 2) { dest[0] = stbi__compute_y_16(src[0], src[1], src[2]), dest[1] = 0xffff; } break;
1634  STBI__CASE(4, 1) { dest[0] = stbi__compute_y_16(src[0], src[1], src[2]); } break;
1635  STBI__CASE(4, 2) { dest[0] = stbi__compute_y_16(src[0], src[1], src[2]), dest[1] = src[3]; } break;
1636  STBI__CASE(4, 3) { dest[0] = src[0], dest[1] = src[1], dest[2] = src[2]; } break;
1637  default: STBI_ASSERT(0);
1638  }
1639 #undef STBI__CASE
1640  }
1641 
1642  STBI_FREE(data);
1643  return good;
1644 }
1645 
1646 #ifndef STBI_NO_LINEAR
1647 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1648 {
1649  int i, k, n;
1650  float *output;
1651  if (!data) return NULL;
1652  output = (float *)stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
1653  if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1654  // compute number of non-alpha components
1655  if (comp & 1) n = comp; else n = comp - 1;
1656  for (i = 0; i < x*y; ++i) {
1657  for (k = 0; k < n; ++k) {
1658  output[i*comp + k] = (float)(pow(data[i*comp + k] / 255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1659  }
1660  if (k < comp) output[i*comp + k] = data[i*comp + k] / 255.0f;
1661  }
1662  STBI_FREE(data);
1663  return output;
1664 }
1665 #endif
1666 
1667 #ifndef STBI_NO_HDR
1668 #define stbi__float2int(x) ((int) (x))
1669 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp)
1670 {
1671  int i, k, n;
1672  stbi_uc *output;
1673  if (!data) return NULL;
1674  output = (stbi_uc *)stbi__malloc_mad3(x, y, comp, 0);
1675  if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1676  // compute number of non-alpha components
1677  if (comp & 1) n = comp; else n = comp - 1;
1678  for (i = 0; i < x*y; ++i) {
1679  for (k = 0; k < n; ++k) {
1680  float z = (float)pow(data[i*comp + k] * stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1681  if (z < 0) z = 0;
1682  if (z > 255) z = 255;
1683  output[i*comp + k] = (stbi_uc)stbi__float2int(z);
1684  }
1685  if (k < comp) {
1686  float z = data[i*comp + k] * 255 + 0.5f;
1687  if (z < 0) z = 0;
1688  if (z > 255) z = 255;
1689  output[i*comp + k] = (stbi_uc)stbi__float2int(z);
1690  }
1691  }
1692  STBI_FREE(data);
1693  return output;
1694 }
1695 #endif
1696 
1698 //
1699 // "baseline" JPEG/JFIF decoder
1700 //
1701 // simple implementation
1702 // - doesn't support delayed output of y-dimension
1703 // - simple interface (only one output format: 8-bit interleaved RGB)
1704 // - doesn't try to recover corrupt jpegs
1705 // - doesn't allow partial loading, loading multiple at once
1706 // - still fast on x86 (copying globals into locals doesn't help x86)
1707 // - allocates lots of intermediate memory (full size of all components)
1708 // - non-interleaved case requires this anyway
1709 // - allows good upsampling (see next)
1710 // high-quality
1711 // - upsampled channels are bilinearly interpolated, even across blocks
1712 // - quality integer IDCT derived from IJG's 'slow'
1713 // performance
1714 // - fast huffman; reasonable integer IDCT
1715 // - some SIMD kernels for common paths on targets with SSE2/NEON
1716 // - uses a lot of intermediate memory, could cache poorly
1717 
1718 #ifndef STBI_NO_JPEG
1719 
1720 // huffman decoding acceleration
1721 #define FAST_BITS 9 // larger handles more cases; smaller stomps less cache
1722 
1723 typedef struct
1724 {
1725  stbi_uc fast[1 << FAST_BITS];
1726  // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1727  stbi__uint16 code[256];
1728  stbi_uc values[256];
1729  stbi_uc size[257];
1730  unsigned int maxcode[18];
1731  int delta[17]; // old 'firstsymbol' - old 'firstcode'
1732 } stbi__huffman;
1733 
1734 typedef struct
1735 {
1736  stbi__context *s;
1737  stbi__huffman huff_dc[4];
1738  stbi__huffman huff_ac[4];
1739  stbi_uc dequant[4][64];
1740  stbi__int16 fast_ac[4][1 << FAST_BITS];
1741 
1742  // sizes for components, interleaved MCUs
1743  int img_h_max, img_v_max;
1744  int img_mcu_x, img_mcu_y;
1745  int img_mcu_w, img_mcu_h;
1746 
1747  // definition of jpeg image component
1748  struct
1749  {
1750  int id;
1751  int h, v;
1752  int tq;
1753  int hd, ha;
1754  int dc_pred;
1755 
1756  int x, y, w2, h2;
1757  stbi_uc *data;
1758  void *raw_data, *raw_coeff;
1759  stbi_uc *linebuf;
1760  short *coeff; // progressive only
1761  int coeff_w, coeff_h; // number of 8x8 coefficient blocks
1762  } img_comp[4];
1763 
1764  stbi__uint32 code_buffer; // jpeg entropy-coded buffer
1765  int code_bits; // number of valid bits
1766  unsigned char marker; // marker seen while filling entropy buffer
1767  int nomore; // flag if we saw a marker so must stop
1768 
1769  int progressive;
1770  int spec_start;
1771  int spec_end;
1772  int succ_high;
1773  int succ_low;
1774  int eob_run;
1775  int rgb;
1776 
1777  int scan_n, order[4];
1778  int restart_interval, todo;
1779 
1780  // kernels
1781  void(*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1782  void(*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1783  stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1784 } stbi__jpeg;
1785 
1786 static int stbi__build_huffman(stbi__huffman *h, int *count)
1787 {
1788  int i, j, k = 0, code;
1789  // build size list for each symbol (from JPEG spec)
1790  for (i = 0; i < 16; ++i)
1791  for (j = 0; j < count[i]; ++j)
1792  h->size[k++] = (stbi_uc)(i + 1);
1793  h->size[k] = 0;
1794 
1795  // compute actual symbols (from jpeg spec)
1796  code = 0;
1797  k = 0;
1798  for (j = 1; j <= 16; ++j) {
1799  // compute delta to add to code to compute symbol id
1800  h->delta[j] = k - code;
1801  if (h->size[k] == j) {
1802  while (h->size[k] == j)
1803  h->code[k++] = (stbi__uint16)(code++);
1804  if (code - 1 >= (1 << j)) return stbi__err("bad code lengths", "Corrupt JPEG");
1805  }
1806  // compute largest code + 1 for this size, preshifted as needed later
1807  h->maxcode[j] = code << (16 - j);
1808  code <<= 1;
1809  }
1810  h->maxcode[j] = 0xffffffff;
1811 
1812  // build non-spec acceleration table; 255 is flag for not-accelerated
1813  memset(h->fast, 255, 1 << FAST_BITS);
1814  for (i = 0; i < k; ++i) {
1815  int s = h->size[i];
1816  if (s <= FAST_BITS) {
1817  int c = h->code[i] << (FAST_BITS - s);
1818  int m = 1 << (FAST_BITS - s);
1819  for (j = 0; j < m; ++j) {
1820  h->fast[c + j] = (stbi_uc)i;
1821  }
1822  }
1823  }
1824  return 1;
1825 }
1826 
1827 // build a table that decodes both magnitude and value of small ACs in
1828 // one go.
1829 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1830 {
1831  int i;
1832  for (i = 0; i < (1 << FAST_BITS); ++i) {
1833  stbi_uc fast = h->fast[i];
1834  fast_ac[i] = 0;
1835  if (fast < 255) {
1836  int rs = h->values[fast];
1837  int run = (rs >> 4) & 15;
1838  int magbits = rs & 15;
1839  int len = h->size[fast];
1840 
1841  if (magbits && len + magbits <= FAST_BITS) {
1842  // magnitude code followed by receive_extend code
1843  int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1844  int m = 1 << (magbits - 1);
1845  if (k < m) k += (-1 << magbits) + 1;
1846  // if the result is small enough, we can fit it in fast_ac table
1847  if (k >= -128 && k <= 127)
1848  fast_ac[i] = (stbi__int16)((k << 8) + (run << 4) + (len + magbits));
1849  }
1850  }
1851  }
1852 }
1853 
1854 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1855 {
1856  do {
1857  int b = j->nomore ? 0 : stbi__get8(j->s);
1858  if (b == 0xff) {
1859  int c = stbi__get8(j->s);
1860  if (c != 0) {
1861  j->marker = (unsigned char)c;
1862  j->nomore = 1;
1863  return;
1864  }
1865  }
1866  j->code_buffer |= b << (24 - j->code_bits);
1867  j->code_bits += 8;
1868  } while (j->code_bits <= 24);
1869 }
1870 
1871 // (1 << n) - 1
1872 static stbi__uint32 stbi__bmask[17] = { 0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535 };
1873 
1874 // decode a jpeg huffman value from the bitstream
1875 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1876 {
1877  unsigned int temp;
1878  int c, k;
1879 
1880  if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1881 
1882  // look at the top FAST_BITS and determine what symbol ID it is,
1883  // if the code is <= FAST_BITS
1884  c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
1885  k = h->fast[c];
1886  if (k < 255) {
1887  int s = h->size[k];
1888  if (s > j->code_bits)
1889  return -1;
1890  j->code_buffer <<= s;
1891  j->code_bits -= s;
1892  return h->values[k];
1893  }
1894 
1895  // naive test is to shift the code_buffer down so k bits are
1896  // valid, then test against maxcode. To speed this up, we've
1897  // preshifted maxcode left so that it has (16-k) 0s at the
1898  // end; in other words, regardless of the number of bits, it
1899  // wants to be compared against something shifted to have 16;
1900  // that way we don't need to shift inside the loop.
1901  temp = j->code_buffer >> 16;
1902  for (k = FAST_BITS + 1; ; ++k)
1903  if (temp < h->maxcode[k])
1904  break;
1905  if (k == 17) {
1906  // error! code not found
1907  j->code_bits -= 16;
1908  return -1;
1909  }
1910 
1911  if (k > j->code_bits)
1912  return -1;
1913 
1914  // convert the huffman code to the symbol id
1915  c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1916  STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1917 
1918  // convert the id to a symbol
1919  j->code_bits -= k;
1920  j->code_buffer <<= k;
1921  return h->values[c];
1922 }
1923 
1924 // bias[n] = (-1<<n) + 1
1925 static int const stbi__jbias[16] = { 0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767 };
1926 
1927 // combined JPEG 'receive' and JPEG 'extend', since baseline
1928 // always extends everything it receives.
1929 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1930 {
1931  unsigned int k;
1932  int sgn;
1933  if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1934 
1935  sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1936  k = stbi_lrot(j->code_buffer, n);
1937  STBI_ASSERT(n >= 0 && n < (int)(sizeof(stbi__bmask) / sizeof(*stbi__bmask)));
1938  j->code_buffer = k & ~stbi__bmask[n];
1939  k &= stbi__bmask[n];
1940  j->code_bits -= n;
1941  return k + (stbi__jbias[n] & ~sgn);
1942 }
1943 
1944 // get some unsigned bits
1945 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1946 {
1947  unsigned int k;
1948  if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1949  k = stbi_lrot(j->code_buffer, n);
1950  j->code_buffer = k & ~stbi__bmask[n];
1951  k &= stbi__bmask[n];
1952  j->code_bits -= n;
1953  return k;
1954 }
1955 
1956 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1957 {
1958  unsigned int k;
1959  if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1960  k = j->code_buffer;
1961  j->code_buffer <<= 1;
1962  --j->code_bits;
1963  return k & 0x80000000;
1964 }
1965 
1966 // given a value that's at position X in the zigzag stream,
1967 // where does it appear in the 8x8 matrix coded as row-major?
1968 static stbi_uc stbi__jpeg_dezigzag[64 + 15] =
1969 {
1970  0, 1, 8, 16, 9, 2, 3, 10,
1971  17, 24, 32, 25, 18, 11, 4, 5,
1972  12, 19, 26, 33, 40, 48, 41, 34,
1973  27, 20, 13, 6, 7, 14, 21, 28,
1974  35, 42, 49, 56, 57, 50, 43, 36,
1975  29, 22, 15, 23, 30, 37, 44, 51,
1976  58, 59, 52, 45, 38, 31, 39, 46,
1977  53, 60, 61, 54, 47, 55, 62, 63,
1978  // let corrupt input sample past end
1979  63, 63, 63, 63, 63, 63, 63, 63,
1980  63, 63, 63, 63, 63, 63, 63
1981 };
1982 
1983 // decode one 64-entry block--
1984 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1985 {
1986  int diff, dc, k;
1987  int t;
1988 
1989  if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1990  t = stbi__jpeg_huff_decode(j, hdc);
1991  if (t < 0) return stbi__err("bad huffman code", "Corrupt JPEG");
1992 
1993  // 0 all the ac values now so we can do it 32-bits at a time
1994  memset(data, 0, 64 * sizeof(data[0]));
1995 
1996  diff = t ? stbi__extend_receive(j, t) : 0;
1997  dc = j->img_comp[b].dc_pred + diff;
1998  j->img_comp[b].dc_pred = dc;
1999  data[0] = (short)(dc * dequant[0]);
2000 
2001  // decode AC components, see JPEG spec
2002  k = 1;
2003  do {
2004  unsigned int zig;
2005  int c, r, s;
2006  if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2007  c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
2008  r = fac[c];
2009  if (r) { // fast-AC path
2010  k += (r >> 4) & 15; // run
2011  s = r & 15; // combined length
2012  j->code_buffer <<= s;
2013  j->code_bits -= s;
2014  // decode into unzigzag'd location
2015  zig = stbi__jpeg_dezigzag[k++];
2016  data[zig] = (short)((r >> 8) * dequant[zig]);
2017  }
2018  else {
2019  int rs = stbi__jpeg_huff_decode(j, hac);
2020  if (rs < 0) return stbi__err("bad huffman code", "Corrupt JPEG");
2021  s = rs & 15;
2022  r = rs >> 4;
2023  if (s == 0) {
2024  if (rs != 0xf0) break; // end block
2025  k += 16;
2026  }
2027  else {
2028  k += r;
2029  // decode into unzigzag'd location
2030  zig = stbi__jpeg_dezigzag[k++];
2031  data[zig] = (short)(stbi__extend_receive(j, s) * dequant[zig]);
2032  }
2033  }
2034  } while (k < 64);
2035  return 1;
2036 }
2037 
2038 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
2039 {
2040  int diff, dc;
2041  int t;
2042  if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2043 
2044  if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2045 
2046  if (j->succ_high == 0) {
2047  // first scan for DC coefficient, must be first
2048  memset(data, 0, 64 * sizeof(data[0])); // 0 all the ac values now
2049  t = stbi__jpeg_huff_decode(j, hdc);
2050  diff = t ? stbi__extend_receive(j, t) : 0;
2051 
2052  dc = j->img_comp[b].dc_pred + diff;
2053  j->img_comp[b].dc_pred = dc;
2054  data[0] = (short)(dc << j->succ_low);
2055  }
2056  else {
2057  // refinement scan for DC coefficient
2058  if (stbi__jpeg_get_bit(j))
2059  data[0] += (short)(1 << j->succ_low);
2060  }
2061  return 1;
2062 }
2063 
2064 // @OPTIMIZE: store non-zigzagged during the decode passes,
2065 // and only de-zigzag when dequantizing
2066 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
2067 {
2068  int k;
2069  if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2070 
2071  if (j->succ_high == 0) {
2072  int shift = j->succ_low;
2073 
2074  if (j->eob_run) {
2075  --j->eob_run;
2076  return 1;
2077  }
2078 
2079  k = j->spec_start;
2080  do {
2081  unsigned int zig;
2082  int c, r, s;
2083  if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2084  c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS) - 1);
2085  r = fac[c];
2086  if (r) { // fast-AC path
2087  k += (r >> 4) & 15; // run
2088  s = r & 15; // combined length
2089  j->code_buffer <<= s;
2090  j->code_bits -= s;
2091  zig = stbi__jpeg_dezigzag[k++];
2092  data[zig] = (short)((r >> 8) << shift);
2093  }
2094  else {
2095  int rs = stbi__jpeg_huff_decode(j, hac);
2096  if (rs < 0) return stbi__err("bad huffman code", "Corrupt JPEG");
2097  s = rs & 15;
2098  r = rs >> 4;
2099  if (s == 0) {
2100  if (r < 15) {
2101  j->eob_run = (1 << r);
2102  if (r)
2103  j->eob_run += stbi__jpeg_get_bits(j, r);
2104  --j->eob_run;
2105  break;
2106  }
2107  k += 16;
2108  }
2109  else {
2110  k += r;
2111  zig = stbi__jpeg_dezigzag[k++];
2112  data[zig] = (short)(stbi__extend_receive(j, s) << shift);
2113  }
2114  }
2115  } while (k <= j->spec_end);
2116  }
2117  else {
2118  // refinement scan for these AC coefficients
2119 
2120  short bit = (short)(1 << j->succ_low);
2121 
2122  if (j->eob_run) {
2123  --j->eob_run;
2124  for (k = j->spec_start; k <= j->spec_end; ++k) {
2125  short *p = &data[stbi__jpeg_dezigzag[k]];
2126  if (*p != 0)
2127  if (stbi__jpeg_get_bit(j))
2128  if ((*p & bit) == 0) {
2129  if (*p > 0)
2130  *p += bit;
2131  else
2132  *p -= bit;
2133  }
2134  }
2135  }
2136  else {
2137  k = j->spec_start;
2138  do {
2139  int r, s;
2140  int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
2141  if (rs < 0) return stbi__err("bad huffman code", "Corrupt JPEG");
2142  s = rs & 15;
2143  r = rs >> 4;
2144  if (s == 0) {
2145  if (r < 15) {
2146  j->eob_run = (1 << r) - 1;
2147  if (r)
2148  j->eob_run += stbi__jpeg_get_bits(j, r);
2149  r = 64; // force end of block
2150  }
2151  else {
2152  // r=15 s=0 should write 16 0s, so we just do
2153  // a run of 15 0s and then write s (which is 0),
2154  // so we don't have to do anything special here
2155  }
2156  }
2157  else {
2158  if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
2159  // sign bit
2160  if (stbi__jpeg_get_bit(j))
2161  s = bit;
2162  else
2163  s = -bit;
2164  }
2165 
2166  // advance by r
2167  while (k <= j->spec_end) {
2168  short *p = &data[stbi__jpeg_dezigzag[k++]];
2169  if (*p != 0) {
2170  if (stbi__jpeg_get_bit(j))
2171  if ((*p & bit) == 0) {
2172  if (*p > 0)
2173  *p += bit;
2174  else
2175  *p -= bit;
2176  }
2177  }
2178  else {
2179  if (r == 0) {
2180  *p = (short)s;
2181  break;
2182  }
2183  --r;
2184  }
2185  }
2186  } while (k <= j->spec_end);
2187  }
2188  }
2189  return 1;
2190 }
2191 
2192 // take a -128..127 value and stbi__clamp it and convert to 0..255
2193 stbi_inline static stbi_uc stbi__clamp(int x)
2194 {
2195  // trick to use a single test to catch both cases
2196  if ((unsigned int)x > 255) {
2197  if (x < 0) return 0;
2198  if (x > 255) return 255;
2199  }
2200  return (stbi_uc)x;
2201 }
2202 
2203 #define stbi__f2f(x) ((int) (((x) * 4096 + 0.5)))
2204 #define stbi__fsh(x) ((x) << 12)
2205 
2206 // derived from jidctint -- DCT_ISLOW
2207 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
2208  int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
2209  p2 = s2; \
2210  p3 = s6; \
2211  p1 = (p2+p3) * stbi__f2f(0.5411961f); \
2212  t2 = p1 + p3*stbi__f2f(-1.847759065f); \
2213  t3 = p1 + p2*stbi__f2f( 0.765366865f); \
2214  p2 = s0; \
2215  p3 = s4; \
2216  t0 = stbi__fsh(p2+p3); \
2217  t1 = stbi__fsh(p2-p3); \
2218  x0 = t0+t3; \
2219  x3 = t0-t3; \
2220  x1 = t1+t2; \
2221  x2 = t1-t2; \
2222  t0 = s7; \
2223  t1 = s5; \
2224  t2 = s3; \
2225  t3 = s1; \
2226  p3 = t0+t2; \
2227  p4 = t1+t3; \
2228  p1 = t0+t3; \
2229  p2 = t1+t2; \
2230  p5 = (p3+p4)*stbi__f2f( 1.175875602f); \
2231  t0 = t0*stbi__f2f( 0.298631336f); \
2232  t1 = t1*stbi__f2f( 2.053119869f); \
2233  t2 = t2*stbi__f2f( 3.072711026f); \
2234  t3 = t3*stbi__f2f( 1.501321110f); \
2235  p1 = p5 + p1*stbi__f2f(-0.899976223f); \
2236  p2 = p5 + p2*stbi__f2f(-2.562915447f); \
2237  p3 = p3*stbi__f2f(-1.961570560f); \
2238  p4 = p4*stbi__f2f(-0.390180644f); \
2239  t3 += p1+p4; \
2240  t2 += p2+p3; \
2241  t1 += p2+p4; \
2242  t0 += p1+p3;
2243 
2244 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2245 {
2246  int i, val[64], *v = val;
2247  stbi_uc *o;
2248  short *d = data;
2249 
2250  // columns
2251  for (i = 0; i < 8; ++i, ++d, ++v) {
2252  // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2253  if (d[8] == 0 && d[16] == 0 && d[24] == 0 && d[32] == 0
2254  && d[40] == 0 && d[48] == 0 && d[56] == 0) {
2255  // no shortcut 0 seconds
2256  // (1|2|3|4|5|6|7)==0 0 seconds
2257  // all separate -0.047 seconds
2258  // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds
2259  int dcterm = d[0] << 2;
2260  v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2261  }
2262  else {
2263  STBI__IDCT_1D(d[0], d[8], d[16], d[24], d[32], d[40], d[48], d[56])
2264  // constants scaled things up by 1<<12; let's bring them back
2265  // down, but keep 2 extra bits of precision
2266  x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2267  v[0] = (x0 + t3) >> 10;
2268  v[56] = (x0 - t3) >> 10;
2269  v[8] = (x1 + t2) >> 10;
2270  v[48] = (x1 - t2) >> 10;
2271  v[16] = (x2 + t1) >> 10;
2272  v[40] = (x2 - t1) >> 10;
2273  v[24] = (x3 + t0) >> 10;
2274  v[32] = (x3 - t0) >> 10;
2275  }
2276  }
2277 
2278  for (i = 0, v = val, o = out; i < 8; ++i, v += 8, o += out_stride) {
2279  // no fast case since the first 1D IDCT spread components out
2280  STBI__IDCT_1D(v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7])
2281  // constants scaled things up by 1<<12, plus we had 1<<2 from first
2282  // loop, plus horizontal and vertical each scale by sqrt(8) so together
2283  // we've got an extra 1<<3, so 1<<17 total we need to remove.
2284  // so we want to round that, which means adding 0.5 * 1<<17,
2285  // aka 65536. Also, we'll end up with -128 to 127 that we want
2286  // to encode as 0..255 by adding 128, so we'll add that before the shift
2287  x0 += 65536 + (128 << 17);
2288  x1 += 65536 + (128 << 17);
2289  x2 += 65536 + (128 << 17);
2290  x3 += 65536 + (128 << 17);
2291  // tried computing the shifts into temps, or'ing the temps to see
2292  // if any were out of range, but that was slower
2293  o[0] = stbi__clamp((x0 + t3) >> 17);
2294  o[7] = stbi__clamp((x0 - t3) >> 17);
2295  o[1] = stbi__clamp((x1 + t2) >> 17);
2296  o[6] = stbi__clamp((x1 - t2) >> 17);
2297  o[2] = stbi__clamp((x2 + t1) >> 17);
2298  o[5] = stbi__clamp((x2 - t1) >> 17);
2299  o[3] = stbi__clamp((x3 + t0) >> 17);
2300  o[4] = stbi__clamp((x3 - t0) >> 17);
2301  }
2302 }
2303 
2304 #ifdef STBI_SSE2
2305 // sse2 integer IDCT. not the fastest possible implementation but it
2306 // produces bit-identical results to the generic C version so it's
2307 // fully "transparent".
2308 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2309 {
2310  // This is constructed to match our regular (generic) integer IDCT exactly.
2311  __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2312  __m128i tmp;
2313 
2314  // dot product constant: even elems=x, odd elems=y
2315 #define dct_const(x,y) _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2316 
2317  // out(0) = c0[even]*x + c0[odd]*y (c0, x, y 16-bit, out 32-bit)
2318  // out(1) = c1[even]*x + c1[odd]*y
2319 #define dct_rot(out0,out1, x,y,c0,c1) \
2320  __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2321  __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2322  __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2323  __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2324  __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2325  __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2326 
2327  // out = in << 12 (in 16-bit, out 32-bit)
2328 #define dct_widen(out, in) \
2329  __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2330  __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2331 
2332  // wide add
2333 #define dct_wadd(out, a, b) \
2334  __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2335  __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2336 
2337  // wide sub
2338 #define dct_wsub(out, a, b) \
2339  __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2340  __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2341 
2342  // butterfly a/b, add bias, then shift by "s" and pack
2343 #define dct_bfly32o(out0, out1, a,b,bias,s) \
2344  { \
2345  __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2346  __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2347  dct_wadd(sum, abiased, b); \
2348  dct_wsub(dif, abiased, b); \
2349  out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2350  out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2351  }
2352 
2353  // 8-bit interleave step (for transposes)
2354 #define dct_interleave8(a, b) \
2355  tmp = a; \
2356  a = _mm_unpacklo_epi8(a, b); \
2357  b = _mm_unpackhi_epi8(tmp, b)
2358 
2359  // 16-bit interleave step (for transposes)
2360 #define dct_interleave16(a, b) \
2361  tmp = a; \
2362  a = _mm_unpacklo_epi16(a, b); \
2363  b = _mm_unpackhi_epi16(tmp, b)
2364 
2365 #define dct_pass(bias,shift) \
2366  { \
2367  /* even part */ \
2368  dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2369  __m128i sum04 = _mm_add_epi16(row0, row4); \
2370  __m128i dif04 = _mm_sub_epi16(row0, row4); \
2371  dct_widen(t0e, sum04); \
2372  dct_widen(t1e, dif04); \
2373  dct_wadd(x0, t0e, t3e); \
2374  dct_wsub(x3, t0e, t3e); \
2375  dct_wadd(x1, t1e, t2e); \
2376  dct_wsub(x2, t1e, t2e); \
2377  /* odd part */ \
2378  dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2379  dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2380  __m128i sum17 = _mm_add_epi16(row1, row7); \
2381  __m128i sum35 = _mm_add_epi16(row3, row5); \
2382  dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2383  dct_wadd(x4, y0o, y4o); \
2384  dct_wadd(x5, y1o, y5o); \
2385  dct_wadd(x6, y2o, y5o); \
2386  dct_wadd(x7, y3o, y4o); \
2387  dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2388  dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2389  dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2390  dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2391  }
2392 
2393  __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2394  __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f(0.765366865f), stbi__f2f(0.5411961f));
2395  __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2396  __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2397  __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f(0.298631336f), stbi__f2f(-1.961570560f));
2398  __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f(3.072711026f));
2399  __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f(2.053119869f), stbi__f2f(-0.390180644f));
2400  __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f(1.501321110f));
2401 
2402  // rounding biases in column/row passes, see stbi__idct_block for explanation.
2403  __m128i bias_0 = _mm_set1_epi32(512);
2404  __m128i bias_1 = _mm_set1_epi32(65536 + (128 << 17));
2405 
2406  // load
2407  row0 = _mm_load_si128((const __m128i *) (data + 0 * 8));
2408  row1 = _mm_load_si128((const __m128i *) (data + 1 * 8));
2409  row2 = _mm_load_si128((const __m128i *) (data + 2 * 8));
2410  row3 = _mm_load_si128((const __m128i *) (data + 3 * 8));
2411  row4 = _mm_load_si128((const __m128i *) (data + 4 * 8));
2412  row5 = _mm_load_si128((const __m128i *) (data + 5 * 8));
2413  row6 = _mm_load_si128((const __m128i *) (data + 6 * 8));
2414  row7 = _mm_load_si128((const __m128i *) (data + 7 * 8));
2415 
2416  // column pass
2417  dct_pass(bias_0, 10);
2418 
2419  {
2420  // 16bit 8x8 transpose pass 1
2421  dct_interleave16(row0, row4);
2422  dct_interleave16(row1, row5);
2423  dct_interleave16(row2, row6);
2424  dct_interleave16(row3, row7);
2425 
2426  // transpose pass 2
2427  dct_interleave16(row0, row2);
2428  dct_interleave16(row1, row3);
2429  dct_interleave16(row4, row6);
2430  dct_interleave16(row5, row7);
2431 
2432  // transpose pass 3
2433  dct_interleave16(row0, row1);
2434  dct_interleave16(row2, row3);
2435  dct_interleave16(row4, row5);
2436  dct_interleave16(row6, row7);
2437  }
2438 
2439  // row pass
2440  dct_pass(bias_1, 17);
2441 
2442  {
2443  // pack
2444  __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2445  __m128i p1 = _mm_packus_epi16(row2, row3);
2446  __m128i p2 = _mm_packus_epi16(row4, row5);
2447  __m128i p3 = _mm_packus_epi16(row6, row7);
2448 
2449  // 8bit 8x8 transpose pass 1
2450  dct_interleave8(p0, p2); // a0e0a1e1...
2451  dct_interleave8(p1, p3); // c0g0c1g1...
2452 
2453  // transpose pass 2
2454  dct_interleave8(p0, p1); // a0c0e0g0...
2455  dct_interleave8(p2, p3); // b0d0f0h0...
2456 
2457  // transpose pass 3
2458  dct_interleave8(p0, p2); // a0b0c0d0...
2459  dct_interleave8(p1, p3); // a4b4c4d4...
2460 
2461  // store
2462  _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2463  _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2464  _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2465  _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2466  _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2467  _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2468  _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2469  _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2470  }
2471 
2472 #undef dct_const
2473 #undef dct_rot
2474 #undef dct_widen
2475 #undef dct_wadd
2476 #undef dct_wsub
2477 #undef dct_bfly32o
2478 #undef dct_interleave8
2479 #undef dct_interleave16
2480 #undef dct_pass
2481 }
2482 
2483 #endif // STBI_SSE2
2484 
2485 #ifdef STBI_NEON
2486 
2487 // NEON integer IDCT. should produce bit-identical
2488 // results to the generic C version.
2489 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2490 {
2491  int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2492 
2493  int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2494  int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2495  int16x4_t rot0_2 = vdup_n_s16(stbi__f2f(0.765366865f));
2496  int16x4_t rot1_0 = vdup_n_s16(stbi__f2f(1.175875602f));
2497  int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2498  int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2499  int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2500  int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2501  int16x4_t rot3_0 = vdup_n_s16(stbi__f2f(0.298631336f));
2502  int16x4_t rot3_1 = vdup_n_s16(stbi__f2f(2.053119869f));
2503  int16x4_t rot3_2 = vdup_n_s16(stbi__f2f(3.072711026f));
2504  int16x4_t rot3_3 = vdup_n_s16(stbi__f2f(1.501321110f));
2505 
2506 #define dct_long_mul(out, inq, coeff) \
2507  int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2508  int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2509 
2510 #define dct_long_mac(out, acc, inq, coeff) \
2511  int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2512  int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2513 
2514 #define dct_widen(out, inq) \
2515  int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2516  int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2517 
2518  // wide add
2519 #define dct_wadd(out, a, b) \
2520  int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2521  int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2522 
2523  // wide sub
2524 #define dct_wsub(out, a, b) \
2525  int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2526  int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2527 
2528  // butterfly a/b, then shift using "shiftop" by "s" and pack
2529 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2530  { \
2531  dct_wadd(sum, a, b); \
2532  dct_wsub(dif, a, b); \
2533  out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2534  out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2535  }
2536 
2537 #define dct_pass(shiftop, shift) \
2538  { \
2539  /* even part */ \
2540  int16x8_t sum26 = vaddq_s16(row2, row6); \
2541  dct_long_mul(p1e, sum26, rot0_0); \
2542  dct_long_mac(t2e, p1e, row6, rot0_1); \
2543  dct_long_mac(t3e, p1e, row2, rot0_2); \
2544  int16x8_t sum04 = vaddq_s16(row0, row4); \
2545  int16x8_t dif04 = vsubq_s16(row0, row4); \
2546  dct_widen(t0e, sum04); \
2547  dct_widen(t1e, dif04); \
2548  dct_wadd(x0, t0e, t3e); \
2549  dct_wsub(x3, t0e, t3e); \
2550  dct_wadd(x1, t1e, t2e); \
2551  dct_wsub(x2, t1e, t2e); \
2552  /* odd part */ \
2553  int16x8_t sum15 = vaddq_s16(row1, row5); \
2554  int16x8_t sum17 = vaddq_s16(row1, row7); \
2555  int16x8_t sum35 = vaddq_s16(row3, row5); \
2556  int16x8_t sum37 = vaddq_s16(row3, row7); \
2557  int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2558  dct_long_mul(p5o, sumodd, rot1_0); \
2559  dct_long_mac(p1o, p5o, sum17, rot1_1); \
2560  dct_long_mac(p2o, p5o, sum35, rot1_2); \
2561  dct_long_mul(p3o, sum37, rot2_0); \
2562  dct_long_mul(p4o, sum15, rot2_1); \
2563  dct_wadd(sump13o, p1o, p3o); \
2564  dct_wadd(sump24o, p2o, p4o); \
2565  dct_wadd(sump23o, p2o, p3o); \
2566  dct_wadd(sump14o, p1o, p4o); \
2567  dct_long_mac(x4, sump13o, row7, rot3_0); \
2568  dct_long_mac(x5, sump24o, row5, rot3_1); \
2569  dct_long_mac(x6, sump23o, row3, rot3_2); \
2570  dct_long_mac(x7, sump14o, row1, rot3_3); \
2571  dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2572  dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2573  dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2574  dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2575  }
2576 
2577  // load
2578  row0 = vld1q_s16(data + 0 * 8);
2579  row1 = vld1q_s16(data + 1 * 8);
2580  row2 = vld1q_s16(data + 2 * 8);
2581  row3 = vld1q_s16(data + 3 * 8);
2582  row4 = vld1q_s16(data + 4 * 8);
2583  row5 = vld1q_s16(data + 5 * 8);
2584  row6 = vld1q_s16(data + 6 * 8);
2585  row7 = vld1q_s16(data + 7 * 8);
2586 
2587  // add DC bias
2588  row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2589 
2590  // column pass
2591  dct_pass(vrshrn_n_s32, 10);
2592 
2593  // 16bit 8x8 transpose
2594  {
2595  // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2596  // whether compilers actually get this is another story, sadly.
2597 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2598 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2599 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2600 
2601  // pass 1
2602  dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2603  dct_trn16(row2, row3);
2604  dct_trn16(row4, row5);
2605  dct_trn16(row6, row7);
2606 
2607  // pass 2
2608  dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2609  dct_trn32(row1, row3);
2610  dct_trn32(row4, row6);
2611  dct_trn32(row5, row7);
2612 
2613  // pass 3
2614  dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2615  dct_trn64(row1, row5);
2616  dct_trn64(row2, row6);
2617  dct_trn64(row3, row7);
2618 
2619 #undef dct_trn16
2620 #undef dct_trn32
2621 #undef dct_trn64
2622  }
2623 
2624  // row pass
2625  // vrshrn_n_s32 only supports shifts up to 16, we need
2626  // 17. so do a non-rounding shift of 16 first then follow
2627  // up with a rounding shift by 1.
2628  dct_pass(vshrn_n_s32, 16);
2629 
2630  {
2631  // pack and round
2632  uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2633  uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2634  uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2635  uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2636  uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2637  uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2638  uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2639  uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2640 
2641  // again, these can translate into one instruction, but often don't.
2642 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2643 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2644 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2645 
2646  // sadly can't use interleaved stores here since we only write
2647  // 8 bytes to each scan line!
2648 
2649  // 8x8 8-bit transpose pass 1
2650  dct_trn8_8(p0, p1);
2651  dct_trn8_8(p2, p3);
2652  dct_trn8_8(p4, p5);
2653  dct_trn8_8(p6, p7);
2654 
2655  // pass 2
2656  dct_trn8_16(p0, p2);
2657  dct_trn8_16(p1, p3);
2658  dct_trn8_16(p4, p6);
2659  dct_trn8_16(p5, p7);
2660 
2661  // pass 3
2662  dct_trn8_32(p0, p4);
2663  dct_trn8_32(p1, p5);
2664  dct_trn8_32(p2, p6);
2665  dct_trn8_32(p3, p7);
2666 
2667  // store
2668  vst1_u8(out, p0); out += out_stride;
2669  vst1_u8(out, p1); out += out_stride;
2670  vst1_u8(out, p2); out += out_stride;
2671  vst1_u8(out, p3); out += out_stride;
2672  vst1_u8(out, p4); out += out_stride;
2673  vst1_u8(out, p5); out += out_stride;
2674  vst1_u8(out, p6); out += out_stride;
2675  vst1_u8(out, p7);
2676 
2677 #undef dct_trn8_8
2678 #undef dct_trn8_16
2679 #undef dct_trn8_32
2680  }
2681 
2682 #undef dct_long_mul
2683 #undef dct_long_mac
2684 #undef dct_widen
2685 #undef dct_wadd
2686 #undef dct_wsub
2687 #undef dct_bfly32o
2688 #undef dct_pass
2689 }
2690 
2691 #endif // STBI_NEON
2692 
2693 #define STBI__MARKER_none 0xff
2694 // if there's a pending marker from the entropy stream, return that
2695 // otherwise, fetch from the stream and get a marker. if there's no
2696 // marker, return 0xff, which is never a valid marker value
2697 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2698 {
2699  stbi_uc x;
2700  if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2701  x = stbi__get8(j->s);
2702  if (x != 0xff) return STBI__MARKER_none;
2703  while (x == 0xff)
2704  x = stbi__get8(j->s);
2705  return x;
2706 }
2707 
2708 // in each scan, we'll have scan_n components, and the order
2709 // of the components is specified by order[]
2710 #define STBI__RESTART(x) ((x) >= 0xd0 && (x) <= 0xd7)
2711 
2712 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2713 // the dc prediction
2714 static void stbi__jpeg_reset(stbi__jpeg *j)
2715 {
2716  j->code_bits = 0;
2717  j->code_buffer = 0;
2718  j->nomore = 0;
2719  j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2720  j->marker = STBI__MARKER_none;
2721  j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2722  j->eob_run = 0;
2723  // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2724  // since we don't even allow 1<<30 pixels
2725 }
2726 
2727 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2728 {
2729  stbi__jpeg_reset(z);
2730  if (!z->progressive) {
2731  if (z->scan_n == 1) {
2732  int i, j;
2733  STBI_SIMD_ALIGN(short, data[64]);
2734  int n = z->order[0];
2735  // non-interleaved data, we just need to process one block at a time,
2736  // in trivial scanline order
2737  // number of blocks to do just depends on how many actual "pixels" this
2738  // component has, independent of interleaved MCU blocking and such
2739  int w = (z->img_comp[n].x + 7) >> 3;
2740  int h = (z->img_comp[n].y + 7) >> 3;
2741  for (j = 0; j < h; ++j) {
2742  for (i = 0; i < w; ++i) {
2743  int ha = z->img_comp[n].ha;
2744  if (!stbi__jpeg_decode_block(z, data, z->huff_dc + z->img_comp[n].hd, z->huff_ac + ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2745  z->idct_block_kernel(z->img_comp[n].data + z->img_comp[n].w2*j * 8 + i * 8, z->img_comp[n].w2, data);
2746  // every data block is an MCU, so countdown the restart interval
2747  if (--z->todo <= 0) {
2748  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2749  // if it's NOT a restart, then just bail, so we get corrupt data
2750  // rather than no data
2751  if (!STBI__RESTART(z->marker)) return 1;
2752  stbi__jpeg_reset(z);
2753  }
2754  }
2755  }
2756  return 1;
2757  }
2758  else { // interleaved
2759  int i, j, k, x, y;
2760  STBI_SIMD_ALIGN(short, data[64]);
2761  for (j = 0; j < z->img_mcu_y; ++j) {
2762  for (i = 0; i < z->img_mcu_x; ++i) {
2763  // scan an interleaved mcu... process scan_n components in order
2764  for (k = 0; k < z->scan_n; ++k) {
2765  int n = z->order[k];
2766  // scan out an mcu's worth of this component; that's just determined
2767  // by the basic H and V specified for the component
2768  for (y = 0; y < z->img_comp[n].v; ++y) {
2769  for (x = 0; x < z->img_comp[n].h; ++x) {
2770  int x2 = (i*z->img_comp[n].h + x) * 8;
2771  int y2 = (j*z->img_comp[n].v + y) * 8;
2772  int ha = z->img_comp[n].ha;
2773  if (!stbi__jpeg_decode_block(z, data, z->huff_dc + z->img_comp[n].hd, z->huff_ac + ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2774  z->idct_block_kernel(z->img_comp[n].data + z->img_comp[n].w2*y2 + x2, z->img_comp[n].w2, data);
2775  }
2776  }
2777  }
2778  // after all interleaved components, that's an interleaved MCU,
2779  // so now count down the restart interval
2780  if (--z->todo <= 0) {
2781  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2782  if (!STBI__RESTART(z->marker)) return 1;
2783  stbi__jpeg_reset(z);
2784  }
2785  }
2786  }
2787  return 1;
2788  }
2789  }
2790  else {
2791  if (z->scan_n == 1) {
2792  int i, j;
2793  int n = z->order[0];
2794  // non-interleaved data, we just need to process one block at a time,
2795  // in trivial scanline order
2796  // number of blocks to do just depends on how many actual "pixels" this
2797  // component has, independent of interleaved MCU blocking and such
2798  int w = (z->img_comp[n].x + 7) >> 3;
2799  int h = (z->img_comp[n].y + 7) >> 3;
2800  for (j = 0; j < h; ++j) {
2801  for (i = 0; i < w; ++i) {
2802  short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2803  if (z->spec_start == 0) {
2804  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2805  return 0;
2806  }
2807  else {
2808  int ha = z->img_comp[n].ha;
2809  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2810  return 0;
2811  }
2812  // every data block is an MCU, so countdown the restart interval
2813  if (--z->todo <= 0) {
2814  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2815  if (!STBI__RESTART(z->marker)) return 1;
2816  stbi__jpeg_reset(z);
2817  }
2818  }
2819  }
2820  return 1;
2821  }
2822  else { // interleaved
2823  int i, j, k, x, y;
2824  for (j = 0; j < z->img_mcu_y; ++j) {
2825  for (i = 0; i < z->img_mcu_x; ++i) {
2826  // scan an interleaved mcu... process scan_n components in order
2827  for (k = 0; k < z->scan_n; ++k) {
2828  int n = z->order[k];
2829  // scan out an mcu's worth of this component; that's just determined
2830  // by the basic H and V specified for the component
2831  for (y = 0; y < z->img_comp[n].v; ++y) {
2832  for (x = 0; x < z->img_comp[n].h; ++x) {
2833  int x2 = (i*z->img_comp[n].h + x);
2834  int y2 = (j*z->img_comp[n].v + y);
2835  short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2836  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2837  return 0;
2838  }
2839  }
2840  }
2841  // after all interleaved components, that's an interleaved MCU,
2842  // so now count down the restart interval
2843  if (--z->todo <= 0) {
2844  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2845  if (!STBI__RESTART(z->marker)) return 1;
2846  stbi__jpeg_reset(z);
2847  }
2848  }
2849  }
2850  return 1;
2851  }
2852  }
2853 }
2854 
2855 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2856 {
2857  int i;
2858  for (i = 0; i < 64; ++i)
2859  data[i] *= dequant[i];
2860 }
2861 
2862 static void stbi__jpeg_finish(stbi__jpeg *z)
2863 {
2864  if (z->progressive) {
2865  // dequantize and idct the data
2866  int i, j, n;
2867  for (n = 0; n < z->s->img_n; ++n) {
2868  int w = (z->img_comp[n].x + 7) >> 3;
2869  int h = (z->img_comp[n].y + 7) >> 3;
2870  for (j = 0; j < h; ++j) {
2871  for (i = 0; i < w; ++i) {
2872  short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2873  stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2874  z->idct_block_kernel(z->img_comp[n].data + z->img_comp[n].w2*j * 8 + i * 8, z->img_comp[n].w2, data);
2875  }
2876  }
2877  }
2878  }
2879 }
2880 
2881 static int stbi__process_marker(stbi__jpeg *z, int m)
2882 {
2883  int L;
2884  switch (m) {
2885  case STBI__MARKER_none: // no marker found
2886  return stbi__err("expected marker", "Corrupt JPEG");
2887 
2888  case 0xDD: // DRI - specify restart interval
2889  if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len", "Corrupt JPEG");
2890  z->restart_interval = stbi__get16be(z->s);
2891  return 1;
2892 
2893  case 0xDB: // DQT - define quantization table
2894  L = stbi__get16be(z->s) - 2;
2895  while (L > 0) {
2896  int q = stbi__get8(z->s);
2897  int p = q >> 4;
2898  int t = q & 15, i;
2899  if (p != 0) return stbi__err("bad DQT type", "Corrupt JPEG");
2900  if (t > 3) return stbi__err("bad DQT table", "Corrupt JPEG");
2901  for (i = 0; i < 64; ++i)
2902  z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2903  L -= 65;
2904  }
2905  return L == 0;
2906 
2907  case 0xC4: // DHT - define huffman table
2908  L = stbi__get16be(z->s) - 2;
2909  while (L > 0) {
2910  stbi_uc *v;
2911  int sizes[16], i, n = 0;
2912  int q = stbi__get8(z->s);
2913  int tc = q >> 4;
2914  int th = q & 15;
2915  if (tc > 1 || th > 3) return stbi__err("bad DHT header", "Corrupt JPEG");
2916  for (i = 0; i < 16; ++i) {
2917  sizes[i] = stbi__get8(z->s);
2918  n += sizes[i];
2919  }
2920  L -= 17;
2921  if (tc == 0) {
2922  if (!stbi__build_huffman(z->huff_dc + th, sizes)) return 0;
2923  v = z->huff_dc[th].values;
2924  }
2925  else {
2926  if (!stbi__build_huffman(z->huff_ac + th, sizes)) return 0;
2927  v = z->huff_ac[th].values;
2928  }
2929  for (i = 0; i < n; ++i)
2930  v[i] = stbi__get8(z->s);
2931  if (tc != 0)
2932  stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2933  L -= n;
2934  }
2935  return L == 0;
2936  }
2937  // check for comment block or APP blocks
2938  if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2939  stbi__skip(z->s, stbi__get16be(z->s) - 2);
2940  return 1;
2941  }
2942  return 0;
2943 }
2944 
2945 // after we see SOS
2946 static int stbi__process_scan_header(stbi__jpeg *z)
2947 {
2948  int i;
2949  int Ls = stbi__get16be(z->s);
2950  z->scan_n = stbi__get8(z->s);
2951  if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int)z->s->img_n) return stbi__err("bad SOS component count", "Corrupt JPEG");
2952  if (Ls != 6 + 2 * z->scan_n) return stbi__err("bad SOS len", "Corrupt JPEG");
2953  for (i = 0; i < z->scan_n; ++i) {
2954  int id = stbi__get8(z->s), which;
2955  int q = stbi__get8(z->s);
2956  for (which = 0; which < z->s->img_n; ++which)
2957  if (z->img_comp[which].id == id)
2958  break;
2959  if (which == z->s->img_n) return 0; // no match
2960  z->img_comp[which].hd = q >> 4; if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff", "Corrupt JPEG");
2961  z->img_comp[which].ha = q & 15; if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff", "Corrupt JPEG");
2962  z->order[i] = which;
2963  }
2964 
2965  {
2966  int aa;
2967  z->spec_start = stbi__get8(z->s);
2968  z->spec_end = stbi__get8(z->s); // should be 63, but might be 0
2969  aa = stbi__get8(z->s);
2970  z->succ_high = (aa >> 4);
2971  z->succ_low = (aa & 15);
2972  if (z->progressive) {
2973  if (z->spec_start > 63 || z->spec_end > 63 || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2974  return stbi__err("bad SOS", "Corrupt JPEG");
2975  }
2976  else {
2977  if (z->spec_start != 0) return stbi__err("bad SOS", "Corrupt JPEG");
2978  if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS", "Corrupt JPEG");
2979  z->spec_end = 63;
2980  }
2981  }
2982 
2983  return 1;
2984 }
2985 
2986 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
2987 {
2988  int i;
2989  for (i = 0; i < ncomp; ++i) {
2990  if (z->img_comp[i].raw_data) {
2991  STBI_FREE(z->img_comp[i].raw_data);
2992  z->img_comp[i].raw_data = NULL;
2993  z->img_comp[i].data = NULL;
2994  }
2995  if (z->img_comp[i].raw_coeff) {
2996  STBI_FREE(z->img_comp[i].raw_coeff);
2997  z->img_comp[i].raw_coeff = 0;
2998  z->img_comp[i].coeff = 0;
2999  }
3000  if (z->img_comp[i].linebuf) {
3001  STBI_FREE(z->img_comp[i].linebuf);
3002  z->img_comp[i].linebuf = NULL;
3003  }
3004  }
3005  return why;
3006 }
3007 
3008 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
3009 {
3010  stbi__context *s = z->s;
3011  int Lf, p, i, q, h_max = 1, v_max = 1, c;
3012  Lf = stbi__get16be(s); if (Lf < 11) return stbi__err("bad SOF len", "Corrupt JPEG"); // JPEG
3013  p = stbi__get8(s); if (p != 8) return stbi__err("only 8-bit", "JPEG format not supported: 8-bit only"); // JPEG baseline
3014  s->img_y = stbi__get16be(s); if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
3015  s->img_x = stbi__get16be(s); if (s->img_x == 0) return stbi__err("0 width", "Corrupt JPEG"); // JPEG requires
3016  c = stbi__get8(s);
3017  if (c != 3 && c != 1) return stbi__err("bad component count", "Corrupt JPEG"); // JFIF requires
3018  s->img_n = c;
3019  for (i = 0; i < c; ++i) {
3020  z->img_comp[i].data = NULL;
3021  z->img_comp[i].linebuf = NULL;
3022  }
3023 
3024  if (Lf != 8 + 3 * s->img_n) return stbi__err("bad SOF len", "Corrupt JPEG");
3025 
3026  z->rgb = 0;
3027  for (i = 0; i < s->img_n; ++i) {
3028  static unsigned char rgb[3] = { 'R', 'G', 'B' };
3029  z->img_comp[i].id = stbi__get8(s);
3030  if (z->img_comp[i].id != i + 1) // JFIF requires
3031  if (z->img_comp[i].id != i) { // some version of jpegtran outputs non-JFIF-compliant files!
3032  // somethings output this (see http://fileformats.archiveteam.org/wiki/JPEG#Color_format)
3033  if (z->img_comp[i].id != rgb[i])
3034  return stbi__err("bad component ID", "Corrupt JPEG");
3035  ++z->rgb;
3036  }
3037  q = stbi__get8(s);
3038  z->img_comp[i].h = (q >> 4); if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H", "Corrupt JPEG");
3039  z->img_comp[i].v = q & 15; if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V", "Corrupt JPEG");
3040  z->img_comp[i].tq = stbi__get8(s); if (z->img_comp[i].tq > 3) return stbi__err("bad TQ", "Corrupt JPEG");
3041  }
3042 
3043  if (scan != STBI__SCAN_load) return 1;
3044 
3045  if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
3046 
3047  for (i = 0; i < s->img_n; ++i) {
3048  if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
3049  if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
3050  }
3051 
3052  // compute interleaved mcu info
3053  z->img_h_max = h_max;
3054  z->img_v_max = v_max;
3055  z->img_mcu_w = h_max * 8;
3056  z->img_mcu_h = v_max * 8;
3057  // these sizes can't be more than 17 bits
3058  z->img_mcu_x = (s->img_x + z->img_mcu_w - 1) / z->img_mcu_w;
3059  z->img_mcu_y = (s->img_y + z->img_mcu_h - 1) / z->img_mcu_h;
3060 
3061  for (i = 0; i < s->img_n; ++i) {
3062  // number of effective pixels (e.g. for non-interleaved MCU)
3063  z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max - 1) / h_max;
3064  z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max - 1) / v_max;
3065  // to simplify generation, we'll allocate enough memory to decode
3066  // the bogus oversized data from using interleaved MCUs and their
3067  // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3068  // discard the extra data until colorspace conversion
3069  //
3070  // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
3071  // so these muls can't overflow with 32-bit ints (which we require)
3072  z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3073  z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3074  z->img_comp[i].coeff = 0;
3075  z->img_comp[i].raw_coeff = 0;
3076  z->img_comp[i].linebuf = NULL;
3077  z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
3078  if (z->img_comp[i].raw_data == NULL)
3079  return stbi__free_jpeg_components(z, i + 1, stbi__err("outofmem", "Out of memory"));
3080  // align blocks for idct using mmx/sse
3081  z->img_comp[i].data = (stbi_uc*)(((size_t)z->img_comp[i].raw_data + 15) & ~15);
3082  if (z->progressive) {
3083  // w2, h2 are multiples of 8 (see above)
3084  z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
3085  z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
3086  z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
3087  if (z->img_comp[i].raw_coeff == NULL)
3088  return stbi__free_jpeg_components(z, i + 1, stbi__err("outofmem", "Out of memory"));
3089  z->img_comp[i].coeff = (short*)(((size_t)z->img_comp[i].raw_coeff + 15) & ~15);
3090  }
3091  }
3092 
3093  return 1;
3094 }
3095 
3096 // use comparisons since in some cases we handle more than one case (e.g. SOF)
3097 #define stbi__DNL(x) ((x) == 0xdc)
3098 #define stbi__SOI(x) ((x) == 0xd8)
3099 #define stbi__EOI(x) ((x) == 0xd9)
3100 #define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3101 #define stbi__SOS(x) ((x) == 0xda)
3102 
3103 #define stbi__SOF_progressive(x) ((x) == 0xc2)
3104 
3105 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
3106 {
3107  int m;
3108  z->marker = STBI__MARKER_none; // initialize cached marker to empty
3109  m = stbi__get_marker(z);
3110  if (!stbi__SOI(m)) return stbi__err("no SOI", "Corrupt JPEG");
3111  if (scan == STBI__SCAN_type) return 1;
3112  m = stbi__get_marker(z);
3113  while (!stbi__SOF(m)) {
3114  if (!stbi__process_marker(z, m)) return 0;
3115  m = stbi__get_marker(z);
3116  while (m == STBI__MARKER_none) {
3117  // some files have extra padding after their blocks, so ok, we'll scan
3118  if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
3119  m = stbi__get_marker(z);
3120  }
3121  }
3122  z->progressive = stbi__SOF_progressive(m);
3123  if (!stbi__process_frame_header(z, scan)) return 0;
3124  return 1;
3125 }
3126 
3127 // decode image to YCbCr format
3128 static int stbi__decode_jpeg_image(stbi__jpeg *j)
3129 {
3130  int m;
3131  for (m = 0; m < 4; m++) {
3132  j->img_comp[m].raw_data = NULL;
3133  j->img_comp[m].raw_coeff = NULL;
3134  }
3135  j->restart_interval = 0;
3136  if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
3137  m = stbi__get_marker(j);
3138  while (!stbi__EOI(m)) {
3139  if (stbi__SOS(m)) {
3140  if (!stbi__process_scan_header(j)) return 0;
3141  if (!stbi__parse_entropy_coded_data(j)) return 0;
3142  if (j->marker == STBI__MARKER_none) {
3143  // handle 0s at the end of image data from IP Kamera 9060
3144  while (!stbi__at_eof(j->s)) {
3145  int x = stbi__get8(j->s);
3146  if (x == 255) {
3147  j->marker = stbi__get8(j->s);
3148  break;
3149  }
3150  else if (x != 0) {
3151  return stbi__err("junk before marker", "Corrupt JPEG");
3152  }
3153  }
3154  // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
3155  }
3156  }
3157  else {
3158  if (!stbi__process_marker(j, m)) return 0;
3159  }
3160  m = stbi__get_marker(j);
3161  }
3162  if (j->progressive)
3163  stbi__jpeg_finish(j);
3164  return 1;
3165 }
3166 
3167 // static jfif-centered resampling (across block boundaries)
3168 
3169 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
3170  int w, int hs);
3171 
3172 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
3173 
3174 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3175 {
3176  STBI_NOTUSED(out);
3177  STBI_NOTUSED(in_far);
3178  STBI_NOTUSED(w);
3179  STBI_NOTUSED(hs);
3180  return in_near;
3181 }
3182 
3183 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3184 {
3185  // need to generate two samples vertically for every one in input
3186  int i;
3187  STBI_NOTUSED(hs);
3188  for (i = 0; i < w; ++i)
3189  out[i] = stbi__div4(3 * in_near[i] + in_far[i] + 2);
3190  return out;
3191 }
3192 
3193 static stbi_uc* stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3194 {
3195  // need to generate two samples horizontally for every one in input
3196  int i;
3197  stbi_uc *input = in_near;
3198 
3199  if (w == 1) {
3200  // if only one sample, can't do any interpolation
3201  out[0] = out[1] = input[0];
3202  return out;
3203  }
3204 
3205  out[0] = input[0];
3206  out[1] = stbi__div4(input[0] * 3 + input[1] + 2);
3207  for (i = 1; i < w - 1; ++i) {
3208  int n = 3 * input[i] + 2;
3209  out[i * 2 + 0] = stbi__div4(n + input[i - 1]);
3210  out[i * 2 + 1] = stbi__div4(n + input[i + 1]);
3211  }
3212  out[i * 2 + 0] = stbi__div4(input[w - 2] * 3 + input[w - 1] + 2);
3213  out[i * 2 + 1] = input[w - 1];
3214 
3215  STBI_NOTUSED(in_far);
3216  STBI_NOTUSED(hs);
3217 
3218  return out;
3219 }
3220 
3221 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
3222 
3223 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3224 {
3225  // need to generate 2x2 samples for every one in input
3226  int i, t0, t1;
3227  if (w == 1) {
3228  out[0] = out[1] = stbi__div4(3 * in_near[0] + in_far[0] + 2);
3229  return out;
3230  }
3231 
3232  t1 = 3 * in_near[0] + in_far[0];
3233  out[0] = stbi__div4(t1 + 2);
3234  for (i = 1; i < w; ++i) {
3235  t0 = t1;
3236  t1 = 3 * in_near[i] + in_far[i];
3237  out[i * 2 - 1] = stbi__div16(3 * t0 + t1 + 8);
3238  out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
3239  }
3240  out[w * 2 - 1] = stbi__div4(t1 + 2);
3241 
3242  STBI_NOTUSED(hs);
3243 
3244  return out;
3245 }
3246 
3247 #if defined(STBI_SSE2) || defined(STBI_NEON)
3248 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3249 {
3250  // need to generate 2x2 samples for every one in input
3251  int i = 0, t0, t1;
3252 
3253  if (w == 1) {
3254  out[0] = out[1] = stbi__div4(3 * in_near[0] + in_far[0] + 2);
3255  return out;
3256  }
3257 
3258  t1 = 3 * in_near[0] + in_far[0];
3259  // process groups of 8 pixels for as long as we can.
3260  // note we can't handle the last pixel in a row in this loop
3261  // because we need to handle the filter boundary conditions.
3262  for (; i < ((w - 1) & ~7); i += 8) {
3263 #if defined(STBI_SSE2)
3264  // load and perform the vertical filtering pass
3265  // this uses 3*x + y = 4*x + (y - x)
3266  __m128i zero = _mm_setzero_si128();
3267  __m128i farb = _mm_loadl_epi64((__m128i *) (in_far + i));
3268  __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
3269  __m128i farw = _mm_unpacklo_epi8(farb, zero);
3270  __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3271  __m128i diff = _mm_sub_epi16(farw, nearw);
3272  __m128i nears = _mm_slli_epi16(nearw, 2);
3273  __m128i curr = _mm_add_epi16(nears, diff); // current row
3274 
3275  // horizontal filter works the same based on shifted vers of current
3276  // row. "prev" is current row shifted right by 1 pixel; we need to
3277  // insert the previous pixel value (from t1).
3278  // "next" is current row shifted left by 1 pixel, with first pixel
3279  // of next block of 8 pixels added in.
3280  __m128i prv0 = _mm_slli_si128(curr, 2);
3281  __m128i nxt0 = _mm_srli_si128(curr, 2);
3282  __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3283  __m128i next = _mm_insert_epi16(nxt0, 3 * in_near[i + 8] + in_far[i + 8], 7);
3284 
3285  // horizontal filter, polyphase implementation since it's convenient:
3286  // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3287  // odd pixels = 3*cur + next = cur*4 + (next - cur)
3288  // note the shared term.
3289  __m128i bias = _mm_set1_epi16(8);
3290  __m128i curs = _mm_slli_epi16(curr, 2);
3291  __m128i prvd = _mm_sub_epi16(prev, curr);
3292  __m128i nxtd = _mm_sub_epi16(next, curr);
3293  __m128i curb = _mm_add_epi16(curs, bias);
3294  __m128i even = _mm_add_epi16(prvd, curb);
3295  __m128i odd = _mm_add_epi16(nxtd, curb);
3296 
3297  // interleave even and odd pixels, then undo scaling.
3298  __m128i int0 = _mm_unpacklo_epi16(even, odd);
3299  __m128i int1 = _mm_unpackhi_epi16(even, odd);
3300  __m128i de0 = _mm_srli_epi16(int0, 4);
3301  __m128i de1 = _mm_srli_epi16(int1, 4);
3302 
3303  // pack and write output
3304  __m128i outv = _mm_packus_epi16(de0, de1);
3305  _mm_storeu_si128((__m128i *) (out + i * 2), outv);
3306 #elif defined(STBI_NEON)
3307  // load and perform the vertical filtering pass
3308  // this uses 3*x + y = 4*x + (y - x)
3309  uint8x8_t farb = vld1_u8(in_far + i);
3310  uint8x8_t nearb = vld1_u8(in_near + i);
3311  int16x8_t diff = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3312  int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3313  int16x8_t curr = vaddq_s16(nears, diff); // current row
3314 
3315  // horizontal filter works the same based on shifted vers of current
3316  // row. "prev" is current row shifted right by 1 pixel; we need to
3317  // insert the previous pixel value (from t1).
3318  // "next" is current row shifted left by 1 pixel, with first pixel
3319  // of next block of 8 pixels added in.
3320  int16x8_t prv0 = vextq_s16(curr, curr, 7);
3321  int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3322  int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3323  int16x8_t next = vsetq_lane_s16(3 * in_near[i + 8] + in_far[i + 8], nxt0, 7);
3324 
3325  // horizontal filter, polyphase implementation since it's convenient:
3326  // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3327  // odd pixels = 3*cur + next = cur*4 + (next - cur)
3328  // note the shared term.
3329  int16x8_t curs = vshlq_n_s16(curr, 2);
3330  int16x8_t prvd = vsubq_s16(prev, curr);
3331  int16x8_t nxtd = vsubq_s16(next, curr);
3332  int16x8_t even = vaddq_s16(curs, prvd);
3333  int16x8_t odd = vaddq_s16(curs, nxtd);
3334 
3335  // undo scaling and round, then store with even/odd phases interleaved
3336  uint8x8x2_t o;
3337  o.val[0] = vqrshrun_n_s16(even, 4);
3338  o.val[1] = vqrshrun_n_s16(odd, 4);
3339  vst2_u8(out + i * 2, o);
3340 #endif
3341 
3342  // "previous" value for next iter
3343  t1 = 3 * in_near[i + 7] + in_far[i + 7];
3344  }
3345 
3346  t0 = t1;
3347  t1 = 3 * in_near[i] + in_far[i];
3348  out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
3349 
3350  for (++i; i < w; ++i) {
3351  t0 = t1;
3352  t1 = 3 * in_near[i] + in_far[i];
3353  out[i * 2 - 1] = stbi__div16(3 * t0 + t1 + 8);
3354  out[i * 2] = stbi__div16(3 * t1 + t0 + 8);
3355  }
3356  out[w * 2 - 1] = stbi__div4(t1 + 2);
3357 
3358  STBI_NOTUSED(hs);
3359 
3360  return out;
3361 }
3362 #endif
3363 
3364 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3365 {
3366  // resample with nearest-neighbor
3367  int i, j;
3368  STBI_NOTUSED(in_far);
3369  for (i = 0; i < w; ++i)
3370  for (j = 0; j < hs; ++j)
3371  out[i*hs + j] = in_near[i];
3372  return out;
3373 }
3374 
3375 #ifdef STBI_JPEG_OLD
3376 // this is the same YCbCr-to-RGB calculation that stb_image has used
3377 // historically before the algorithm changes in 1.49
3378 #define float2fixed(x) ((int) ((x) * 65536 + 0.5))
3379 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3380 {
3381  int i;
3382  for (i = 0; i < count; ++i) {
3383  int y_fixed = (y[i] << 16) + 32768; // rounding
3384  int r, g, b;
3385  int cr = pcr[i] - 128;
3386  int cb = pcb[i] - 128;
3387  r = y_fixed + cr*float2fixed(1.40200f);
3388  g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3389  b = y_fixed + cb*float2fixed(1.77200f);
3390  r >>= 16;
3391  g >>= 16;
3392  b >>= 16;
3393  if ((unsigned)r > 255) { if (r < 0) r = 0; else r = 255; }
3394  if ((unsigned)g > 255) { if (g < 0) g = 0; else g = 255; }
3395  if ((unsigned)b > 255) { if (b < 0) b = 0; else b = 255; }
3396  out[0] = (stbi_uc)r;
3397  out[1] = (stbi_uc)g;
3398  out[2] = (stbi_uc)b;
3399  out[3] = 255;
3400  out += step;
3401  }
3402 }
3403 #else
3404 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3405 // to make sure the code produces the same results in both SIMD and scalar
3406 #define float2fixed(x) (((int) ((x) * 4096.0f + 0.5f)) << 8)
3407 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3408 {
3409  int i;
3410  for (i = 0; i < count; ++i) {
3411  int y_fixed = (y[i] << 20) + (1 << 19); // rounding
3412  int r, g, b;
3413  int cr = pcr[i] - 128;
3414  int cb = pcb[i] - 128;
3415  r = y_fixed + cr* float2fixed(1.40200f);
3416  g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3417  b = y_fixed + cb* float2fixed(1.77200f);
3418  r >>= 20;
3419  g >>= 20;
3420  b >>= 20;
3421  if ((unsigned)r > 255) { if (r < 0) r = 0; else r = 255; }
3422  if ((unsigned)g > 255) { if (g < 0) g = 0; else g = 255; }
3423  if ((unsigned)b > 255) { if (b < 0) b = 0; else b = 255; }
3424  out[0] = (stbi_uc)r;
3425  out[1] = (stbi_uc)g;
3426  out[2] = (stbi_uc)b;
3427  out[3] = 255;
3428  out += step;
3429  }
3430 }
3431 #endif
3432 
3433 #if defined(STBI_SSE2) || defined(STBI_NEON)
3434 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3435 {
3436  int i = 0;
3437 
3438 #ifdef STBI_SSE2
3439  // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3440  // it's useful in practice (you wouldn't use it for textures, for example).
3441  // so just accelerate step == 4 case.
3442  if (step == 4) {
3443  // this is a fairly straightforward implementation and not super-optimized.
3444  __m128i signflip = _mm_set1_epi8(-0x80);
3445  __m128i cr_const0 = _mm_set1_epi16((short)(1.40200f*4096.0f + 0.5f));
3446  __m128i cr_const1 = _mm_set1_epi16(-(short)(0.71414f*4096.0f + 0.5f));
3447  __m128i cb_const0 = _mm_set1_epi16(-(short)(0.34414f*4096.0f + 0.5f));
3448  __m128i cb_const1 = _mm_set1_epi16((short)(1.77200f*4096.0f + 0.5f));
3449  __m128i y_bias = _mm_set1_epi8((char)(unsigned char)128);
3450  __m128i xw = _mm_set1_epi16(255); // alpha channel
3451 
3452  for (; i + 7 < count; i += 8) {
3453  // load
3454  __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y + i));
3455  __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr + i));
3456  __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb + i));
3457  __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3458  __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3459 
3460  // unpack to short (and left-shift cr, cb by 8)
3461  __m128i yw = _mm_unpacklo_epi8(y_bias, y_bytes);
3462  __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3463  __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3464 
3465  // color transform
3466  __m128i yws = _mm_srli_epi16(yw, 4);
3467  __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3468  __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3469  __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3470  __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3471  __m128i rws = _mm_add_epi16(cr0, yws);
3472  __m128i gwt = _mm_add_epi16(cb0, yws);
3473  __m128i bws = _mm_add_epi16(yws, cb1);
3474  __m128i gws = _mm_add_epi16(gwt, cr1);
3475 
3476  // descale
3477  __m128i rw = _mm_srai_epi16(rws, 4);
3478  __m128i bw = _mm_srai_epi16(bws, 4);
3479  __m128i gw = _mm_srai_epi16(gws, 4);
3480 
3481  // back to byte, set up for transpose
3482  __m128i brb = _mm_packus_epi16(rw, bw);
3483  __m128i gxb = _mm_packus_epi16(gw, xw);
3484 
3485  // transpose to interleave channels
3486  __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3487  __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3488  __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3489  __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3490 
3491  // store
3492  _mm_storeu_si128((__m128i *) (out + 0), o0);
3493  _mm_storeu_si128((__m128i *) (out + 16), o1);
3494  out += 32;
3495  }
3496  }
3497 #endif
3498 
3499 #ifdef STBI_NEON
3500  // in this version, step=3 support would be easy to add. but is there demand?
3501  if (step == 4) {
3502  // this is a fairly straightforward implementation and not super-optimized.
3503  uint8x8_t signflip = vdup_n_u8(0x80);
3504  int16x8_t cr_const0 = vdupq_n_s16((short)(1.40200f*4096.0f + 0.5f));
3505  int16x8_t cr_const1 = vdupq_n_s16(-(short)(0.71414f*4096.0f + 0.5f));
3506  int16x8_t cb_const0 = vdupq_n_s16(-(short)(0.34414f*4096.0f + 0.5f));
3507  int16x8_t cb_const1 = vdupq_n_s16((short)(1.77200f*4096.0f + 0.5f));
3508 
3509  for (; i + 7 < count; i += 8) {
3510  // load
3511  uint8x8_t y_bytes = vld1_u8(y + i);
3512  uint8x8_t cr_bytes = vld1_u8(pcr + i);
3513  uint8x8_t cb_bytes = vld1_u8(pcb + i);
3514  int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3515  int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3516 
3517  // expand to s16
3518  int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3519  int16x8_t crw = vshll_n_s8(cr_biased, 7);
3520  int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3521 
3522  // color transform
3523  int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3524  int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3525  int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3526  int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3527  int16x8_t rws = vaddq_s16(yws, cr0);
3528  int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3529  int16x8_t bws = vaddq_s16(yws, cb1);
3530 
3531  // undo scaling, round, convert to byte
3532  uint8x8x4_t o;
3533  o.val[0] = vqrshrun_n_s16(rws, 4);
3534  o.val[1] = vqrshrun_n_s16(gws, 4);
3535  o.val[2] = vqrshrun_n_s16(bws, 4);
3536  o.val[3] = vdup_n_u8(255);
3537 
3538  // store, interleaving r/g/b/a
3539  vst4_u8(out, o);
3540  out += 8 * 4;
3541  }
3542  }
3543 #endif
3544 
3545  for (; i < count; ++i) {
3546  int y_fixed = (y[i] << 20) + (1 << 19); // rounding
3547  int r, g, b;
3548  int cr = pcr[i] - 128;
3549  int cb = pcb[i] - 128;
3550  r = y_fixed + cr* float2fixed(1.40200f);
3551  g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3552  b = y_fixed + cb* float2fixed(1.77200f);
3553  r >>= 20;
3554  g >>= 20;
3555  b >>= 20;
3556  if ((unsigned)r > 255) { if (r < 0) r = 0; else r = 255; }
3557  if ((unsigned)g > 255) { if (g < 0) g = 0; else g = 255; }
3558  if ((unsigned)b > 255) { if (b < 0) b = 0; else b = 255; }
3559  out[0] = (stbi_uc)r;
3560  out[1] = (stbi_uc)g;
3561  out[2] = (stbi_uc)b;
3562  out[3] = 255;
3563  out += step;
3564  }
3565 }
3566 #endif
3567 
3568 // set up the kernels
3569 static void stbi__setup_jpeg(stbi__jpeg *j)
3570 {
3571  j->idct_block_kernel = stbi__idct_block;
3572  j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3573  j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3574 
3575 #ifdef STBI_SSE2
3576  if (stbi__sse2_available()) {
3577  j->idct_block_kernel = stbi__idct_simd;
3578 #ifndef STBI_JPEG_OLD
3579  j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3580 #endif
3581  j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3582  }
3583 #endif
3584 
3585 #ifdef STBI_NEON
3586  j->idct_block_kernel = stbi__idct_simd;
3587 #ifndef STBI_JPEG_OLD
3588  j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3589 #endif
3590  j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3591 #endif
3592 }
3593 
3594 // clean up the temporary component buffers
3595 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3596 {
3597  stbi__free_jpeg_components(j, j->s->img_n, 0);
3598 }
3599 
3600 typedef struct
3601 {
3602  resample_row_func resample;
3603  stbi_uc *line0, *line1;
3604  int hs, vs; // expansion factor in each axis
3605  int w_lores; // horizontal pixels pre-expansion
3606  int ystep; // how far through vertical expansion we are
3607  int ypos; // which pre-expansion row we're on
3608 } stbi__resample;
3609 
3610 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3611 {
3612  int n, decode_n;
3613  z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3614 
3615  // validate req_comp
3616  if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3617 
3618  // load a jpeg image from whichever source, but leave in YCbCr format
3619  if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3620 
3621  // determine actual number of components to generate
3622  n = req_comp ? req_comp : z->s->img_n;
3623 
3624  if (z->s->img_n == 3 && n < 3)
3625  decode_n = 1;
3626  else
3627  decode_n = z->s->img_n;
3628 
3629  // resample and color-convert
3630  {
3631  int k;
3632  unsigned int i, j;
3633  stbi_uc *output;
3634  stbi_uc *coutput[4];
3635 
3636  stbi__resample res_comp[4];
3637 
3638  for (k = 0; k < decode_n; ++k) {
3639  stbi__resample *r = &res_comp[k];
3640 
3641  // allocate line buffer big enough for upsampling off the edges
3642  // with upsample factor of 4
3643  z->img_comp[k].linebuf = (stbi_uc *)stbi__malloc(z->s->img_x + 3);
3644  if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3645 
3646  r->hs = z->img_h_max / z->img_comp[k].h;
3647  r->vs = z->img_v_max / z->img_comp[k].v;
3648  r->ystep = r->vs >> 1;
3649  r->w_lores = (z->s->img_x + r->hs - 1) / r->hs;
3650  r->ypos = 0;
3651  r->line0 = r->line1 = z->img_comp[k].data;
3652 
3653  if (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3654  else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3655  else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3656  else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3657  else r->resample = stbi__resample_row_generic;
3658  }
3659 
3660  // can't error after this so, this is safe
3661  output = (stbi_uc *)stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
3662  if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3663 
3664  // now go ahead and resample
3665  for (j = 0; j < z->s->img_y; ++j) {
3666  stbi_uc *out = output + n * z->s->img_x * j;
3667  for (k = 0; k < decode_n; ++k) {
3668  stbi__resample *r = &res_comp[k];
3669  int y_bot = r->ystep >= (r->vs >> 1);
3670  coutput[k] = r->resample(z->img_comp[k].linebuf,
3671  y_bot ? r->line1 : r->line0,
3672  y_bot ? r->line0 : r->line1,
3673  r->w_lores, r->hs);
3674  if (++r->ystep >= r->vs) {
3675  r->ystep = 0;
3676  r->line0 = r->line1;
3677  if (++r->ypos < z->img_comp[k].y)
3678  r->line1 += z->img_comp[k].w2;
3679  }
3680  }
3681  if (n >= 3) {
3682  stbi_uc *y = coutput[0];
3683  if (z->s->img_n == 3) {
3684  if (z->rgb == 3) {
3685  for (i = 0; i < z->s->img_x; ++i) {
3686  out[0] = y[i];
3687  out[1] = coutput[1][i];
3688  out[2] = coutput[2][i];
3689  out[3] = 255;
3690  out += n;
3691  }
3692  }
3693  else {
3694  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3695  }
3696  }
3697  else
3698  for (i = 0; i < z->s->img_x; ++i) {
3699  out[0] = out[1] = out[2] = y[i];
3700  out[3] = 255; // not used if n==3
3701  out += n;
3702  }
3703  }
3704  else {
3705  stbi_uc *y = coutput[0];
3706  if (n == 1)
3707  for (i = 0; i < z->s->img_x; ++i) out[i] = y[i];
3708  else
3709  for (i = 0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3710  }
3711  }
3712  stbi__cleanup_jpeg(z);
3713  *out_x = z->s->img_x;
3714  *out_y = z->s->img_y;
3715  if (comp) *comp = z->s->img_n; // report original components, not output
3716  return output;
3717  }
3718 }
3719 
3720 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
3721 {
3722  unsigned char* result;
3723  stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
3724  j->s = s;
3725  stbi__setup_jpeg(j);
3726  result = load_jpeg_image(j, x, y, comp, req_comp);
3727  STBI_FREE(j);
3728  return result;
3729 }
3730 
3731 static int stbi__jpeg_test(stbi__context *s)
3732 {
3733  int r;
3734  static stbi__jpeg j;
3735  j.s = s;
3736  stbi__setup_jpeg(&j);
3737  r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3738  stbi__rewind(s);
3739  return r;
3740 }
3741 
3742 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3743 {
3744  if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3745  stbi__rewind(j->s);
3746  return 0;
3747  }
3748  if (x) *x = j->s->img_x;
3749  if (y) *y = j->s->img_y;
3750  if (comp) *comp = j->s->img_n;
3751  return 1;
3752 }
3753 
3754 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3755 {
3756  int result;
3757  stbi__jpeg* j = (stbi__jpeg*)(stbi__malloc(sizeof(stbi__jpeg)));
3758  j->s = s;
3759  result = stbi__jpeg_info_raw(j, x, y, comp);
3760  STBI_FREE(j);
3761  return result;
3762 }
3763 #endif
3764 
3765 // public domain zlib decode v0.2 Sean Barrett 2006-11-18
3766 // simple implementation
3767 // - all input must be provided in an upfront buffer
3768 // - all output is written to a single output buffer (can malloc/realloc)
3769 // performance
3770 // - fast huffman
3771 
3772 #ifndef STBI_NO_ZLIB
3773 
3774 // fast-way is faster to check than jpeg huffman, but slow way is slower
3775 #define STBI__ZFAST_BITS 9 // accelerate all cases in default tables
3776 #define STBI__ZFAST_MASK ((1 << STBI__ZFAST_BITS) - 1)
3777 
3778 // zlib-style huffman encoding
3779 // (jpegs packs from left, zlib from right, so can't share code)
3780 typedef struct
3781 {
3782  stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3783  stbi__uint16 firstcode[16];
3784  int maxcode[17];
3785  stbi__uint16 firstsymbol[16];
3786  stbi_uc size[288];
3787  stbi__uint16 value[288];
3788 } stbi__zhuffman;
3789 
3790 stbi_inline static int stbi__bitreverse16(int n)
3791 {
3792  n = ((n & 0xAAAA) >> 1) | ((n & 0x5555) << 1);
3793  n = ((n & 0xCCCC) >> 2) | ((n & 0x3333) << 2);
3794  n = ((n & 0xF0F0) >> 4) | ((n & 0x0F0F) << 4);
3795  n = ((n & 0xFF00) >> 8) | ((n & 0x00FF) << 8);
3796  return n;
3797 }
3798 
3799 stbi_inline static int stbi__bit_reverse(int v, int bits)
3800 {
3801  STBI_ASSERT(bits <= 16);
3802  // to bit reverse n bits, reverse 16 and shift
3803  // e.g. 11 bits, bit reverse and shift away 5
3804  return stbi__bitreverse16(v) >> (16 - bits);
3805 }
3806 
3807 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3808 {
3809  int i, k = 0;
3810  int code, next_code[16], sizes[17];
3811 
3812  // DEFLATE spec for generating codes
3813  memset(sizes, 0, sizeof(sizes));
3814  memset(z->fast, 0, sizeof(z->fast));
3815  for (i = 0; i < num; ++i)
3816  ++sizes[sizelist[i]];
3817  sizes[0] = 0;
3818  for (i = 1; i < 16; ++i)
3819  if (sizes[i] >(1 << i))
3820  return stbi__err("bad sizes", "Corrupt PNG");
3821  code = 0;
3822  for (i = 1; i < 16; ++i) {
3823  next_code[i] = code;
3824  z->firstcode[i] = (stbi__uint16)code;
3825  z->firstsymbol[i] = (stbi__uint16)k;
3826  code = (code + sizes[i]);
3827  if (sizes[i])
3828  if (code - 1 >= (1 << i)) return stbi__err("bad codelengths", "Corrupt PNG");
3829  z->maxcode[i] = code << (16 - i); // preshift for inner loop
3830  code <<= 1;
3831  k += sizes[i];
3832  }
3833  z->maxcode[16] = 0x10000; // sentinel
3834  for (i = 0; i < num; ++i) {
3835  int s = sizelist[i];
3836  if (s) {
3837  int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3838  stbi__uint16 fastv = (stbi__uint16)((s << 9) | i);
3839  z->size[c] = (stbi_uc)s;
3840  z->value[c] = (stbi__uint16)i;
3841  if (s <= STBI__ZFAST_BITS) {
3842  int j = stbi__bit_reverse(next_code[s], s);
3843  while (j < (1 << STBI__ZFAST_BITS)) {
3844  z->fast[j] = fastv;
3845  j += (1 << s);
3846  }
3847  }
3848  ++next_code[s];
3849  }
3850  }
3851  return 1;
3852 }
3853 
3854 // zlib-from-memory implementation for PNG reading
3855 // because PNG allows splitting the zlib stream arbitrarily,
3856 // and it's annoying structurally to have PNG call ZLIB call PNG,
3857 // we require PNG read all the IDATs and combine them into a single
3858 // memory buffer
3859 
3860 typedef struct
3861 {
3862  stbi_uc *zbuffer, *zbuffer_end;
3863  int num_bits;
3864  stbi__uint32 code_buffer;
3865 
3866  char *zout;
3867  char *zout_start;
3868  char *zout_end;
3869  int z_expandable;
3870 
3871  stbi__zhuffman z_length, z_distance;
3872 } stbi__zbuf;
3873 
3874 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3875 {
3876  if (z->zbuffer >= z->zbuffer_end) return 0;
3877  return *z->zbuffer++;
3878 }
3879 
3880 static void stbi__fill_bits(stbi__zbuf *z)
3881 {
3882  do {
3883  STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3884  z->code_buffer |= (unsigned int)stbi__zget8(z) << z->num_bits;
3885  z->num_bits += 8;
3886  } while (z->num_bits <= 24);
3887 }
3888 
3889 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3890 {
3891  unsigned int k;
3892  if (z->num_bits < n) stbi__fill_bits(z);
3893  k = z->code_buffer & ((1 << n) - 1);
3894  z->code_buffer >>= n;
3895  z->num_bits -= n;
3896  return k;
3897 }
3898 
3899 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3900 {
3901  int b, s, k;
3902  // not resolved by fast table, so compute it the slow way
3903  // use jpeg approach, which requires MSbits at top
3904  k = stbi__bit_reverse(a->code_buffer, 16);
3905  for (s = STBI__ZFAST_BITS + 1; ; ++s)
3906  if (k < z->maxcode[s])
3907  break;
3908  if (s == 16) return -1; // invalid code!
3909  // code size is s, so:
3910  b = (k >> (16 - s)) - z->firstcode[s] + z->firstsymbol[s];
3911  STBI_ASSERT(z->size[b] == s);
3912  a->code_buffer >>= s;
3913  a->num_bits -= s;
3914  return z->value[b];
3915 }
3916 
3917 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3918 {
3919  int b, s;
3920  if (a->num_bits < 16) stbi__fill_bits(a);
3921  b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3922  if (b) {
3923  s = b >> 9;
3924  a->code_buffer >>= s;
3925  a->num_bits -= s;
3926  return b & 511;
3927  }
3928  return stbi__zhuffman_decode_slowpath(a, z);
3929 }
3930 
3931 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes
3932 {
3933  char *q;
3934  int cur, limit, old_limit;
3935  z->zout = zout;
3936  if (!z->z_expandable) return stbi__err("output buffer limit", "Corrupt PNG");
3937  cur = (int)(z->zout - z->zout_start);
3938  limit = old_limit = (int)(z->zout_end - z->zout_start);
3939  while (cur + n > limit)
3940  limit *= 2;
3941  q = (char *)STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
3942  STBI_NOTUSED(old_limit);
3943  if (q == NULL) return stbi__err("outofmem", "Out of memory");
3944  z->zout_start = q;
3945  z->zout = q + cur;
3946  z->zout_end = q + limit;
3947  return 1;
3948 }
3949 
3950 static int stbi__zlength_base[31] = {
3951  3,4,5,6,7,8,9,10,11,13,
3952  15,17,19,23,27,31,35,43,51,59,
3953  67,83,99,115,131,163,195,227,258,0,0 };
3954 
3955 static int stbi__zlength_extra[31] =
3956 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3957 
3958 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3959 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0 };
3960 
3961 static int stbi__zdist_extra[32] =
3962 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13 };
3963 
3964 static int stbi__parse_huffman_block(stbi__zbuf *a)
3965 {
3966  char *zout = a->zout;
3967  for (;;) {
3968  int z = stbi__zhuffman_decode(a, &a->z_length);
3969  if (z < 256) {
3970  if (z < 0) return stbi__err("bad huffman code", "Corrupt PNG"); // error in huffman codes
3971  if (zout >= a->zout_end) {
3972  if (!stbi__zexpand(a, zout, 1)) return 0;
3973  zout = a->zout;
3974  }
3975  *zout++ = (char)z;
3976  }
3977  else {
3978  stbi_uc *p;
3979  int len, dist;
3980  if (z == 256) {
3981  a->zout = zout;
3982  return 1;
3983  }
3984  z -= 257;
3985  len = stbi__zlength_base[z];
3986  if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3987  z = stbi__zhuffman_decode(a, &a->z_distance);
3988  if (z < 0) return stbi__err("bad huffman code", "Corrupt PNG");
3989  dist = stbi__zdist_base[z];
3990  if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3991  if (zout - a->zout_start < dist) return stbi__err("bad dist", "Corrupt PNG");
3992  if (zout + len > a->zout_end) {
3993  if (!stbi__zexpand(a, zout, len)) return 0;
3994  zout = a->zout;
3995  }
3996  p = (stbi_uc *)(zout - dist);
3997  if (dist == 1) { // run of one byte; common in images.
3998  stbi_uc v = *p;
3999  if (len) { do *zout++ = v; while (--len); }
4000  }
4001  else {
4002  if (len) { do *zout++ = *p++; while (--len); }
4003  }
4004  }
4005  }
4006 }
4007 
4008 static int stbi__compute_huffman_codes(stbi__zbuf *a)
4009 {
4010  static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
4011  stbi__zhuffman z_codelength;
4012  stbi_uc lencodes[286 + 32 + 137];//padding for maximum single op
4013  stbi_uc codelength_sizes[19];
4014  int i, n;
4015 
4016  int hlit = stbi__zreceive(a, 5) + 257;
4017  int hdist = stbi__zreceive(a, 5) + 1;
4018  int hclen = stbi__zreceive(a, 4) + 4;
4019  int ntot = hlit + hdist;
4020 
4021  memset(codelength_sizes, 0, sizeof(codelength_sizes));
4022  for (i = 0; i < hclen; ++i) {
4023  int s = stbi__zreceive(a, 3);
4024  codelength_sizes[length_dezigzag[i]] = (stbi_uc)s;
4025  }
4026  if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
4027 
4028  n = 0;
4029  while (n < ntot) {
4030  int c = stbi__zhuffman_decode(a, &z_codelength);
4031  if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
4032  if (c < 16)
4033  lencodes[n++] = (stbi_uc)c;
4034  else {
4035  stbi_uc fill = 0;
4036  if (c == 16) {
4037  c = stbi__zreceive(a, 2) + 3;
4038  if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
4039  fill = lencodes[n - 1];
4040  }
4041  else if (c == 17)
4042  c = stbi__zreceive(a, 3) + 3;
4043  else {
4044  STBI_ASSERT(c == 18);
4045  c = stbi__zreceive(a, 7) + 11;
4046  }
4047  if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
4048  memset(lencodes + n, fill, c);
4049  n += c;
4050  }
4051  }
4052  if (n != ntot) return stbi__err("bad codelengths", "Corrupt PNG");
4053  if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
4054  if (!stbi__zbuild_huffman(&a->z_distance, lencodes + hlit, hdist)) return 0;
4055  return 1;
4056 }
4057 
4058 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
4059 {
4060  stbi_uc header[4];
4061  int len, nlen, k;
4062  if (a->num_bits & 7)
4063  stbi__zreceive(a, a->num_bits & 7); // discard
4064  // drain the bit-packed data into header
4065  k = 0;
4066  while (a->num_bits > 0) {
4067  header[k++] = (stbi_uc)(a->code_buffer & 255); // suppress MSVC run-time check
4068  a->code_buffer >>= 8;
4069  a->num_bits -= 8;
4070  }
4071  STBI_ASSERT(a->num_bits == 0);
4072  // now fill header the normal way
4073  while (k < 4)
4074  header[k++] = stbi__zget8(a);
4075  len = header[1] * 256 + header[0];
4076  nlen = header[3] * 256 + header[2];
4077  if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt", "Corrupt PNG");
4078  if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer", "Corrupt PNG");
4079  if (a->zout + len > a->zout_end)
4080  if (!stbi__zexpand(a, a->zout, len)) return 0;
4081  memcpy(a->zout, a->zbuffer, len);
4082  a->zbuffer += len;
4083  a->zout += len;
4084  return 1;
4085 }
4086 
4087 static int stbi__parse_zlib_header(stbi__zbuf *a)
4088 {
4089  int cmf = stbi__zget8(a);
4090  int cm = cmf & 15;
4091  /* int cinfo = cmf >> 4; */
4092  int flg = stbi__zget8(a);
4093  if ((cmf * 256 + flg) % 31 != 0) return stbi__err("bad zlib header", "Corrupt PNG"); // zlib spec
4094  if (flg & 32) return stbi__err("no preset dict", "Corrupt PNG"); // preset dictionary not allowed in png
4095  if (cm != 8) return stbi__err("bad compression", "Corrupt PNG"); // DEFLATE required for png
4096  // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4097  return 1;
4098 }
4099 
4100 // @TODO: should statically initialize these for optimal thread safety
4101 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
4102 static void stbi__init_zdefaults(void)
4103 {
4104  int i; // use <= to match clearly with spec
4105  for (i = 0; i <= 143; ++i) stbi__zdefault_length[i] = 8;
4106  for (; i <= 255; ++i) stbi__zdefault_length[i] = 9;
4107  for (; i <= 279; ++i) stbi__zdefault_length[i] = 7;
4108  for (; i <= 287; ++i) stbi__zdefault_length[i] = 8;
4109 
4110  for (i = 0; i <= 31; ++i) stbi__zdefault_distance[i] = 5;
4111 }
4112 
4113 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
4114 {
4115  int final, type;
4116  if (parse_header)
4117  if (!stbi__parse_zlib_header(a)) return 0;
4118  a->num_bits = 0;
4119  a->code_buffer = 0;
4120  do {
4121  final = stbi__zreceive(a, 1);
4122  type = stbi__zreceive(a, 2);
4123  if (type == 0) {
4124  if (!stbi__parse_uncompressed_block(a)) return 0;
4125  }
4126  else if (type == 3) {
4127  return 0;
4128  }
4129  else {
4130  if (type == 1) {
4131  // use fixed code lengths
4132  if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
4133  if (!stbi__zbuild_huffman(&a->z_length, stbi__zdefault_length, 288)) return 0;
4134  if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance, 32)) return 0;
4135  }
4136  else {
4137  if (!stbi__compute_huffman_codes(a)) return 0;
4138  }
4139  if (!stbi__parse_huffman_block(a)) return 0;
4140  }
4141  } while (!final);
4142  return 1;
4143 }
4144 
4145 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
4146 {
4147  a->zout_start = obuf;
4148  a->zout = obuf;
4149  a->zout_end = obuf + olen;
4150  a->z_expandable = exp;
4151 
4152  return stbi__parse_zlib(a, parse_header);
4153 }
4154 
4155 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
4156 {
4157  stbi__zbuf a;
4158  char *p = (char *)stbi__malloc(initial_size);
4159  if (p == NULL) return NULL;
4160  a.zbuffer = (stbi_uc *)buffer;
4161  a.zbuffer_end = (stbi_uc *)buffer + len;
4162  if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4163  if (outlen) *outlen = (int)(a.zout - a.zout_start);
4164  return a.zout_start;
4165  }
4166  else {
4167  STBI_FREE(a.zout_start);
4168  return NULL;
4169  }
4170 }
4171 
4172 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
4173 {
4174  return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4175 }
4176 
4177 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
4178 {
4179  stbi__zbuf a;
4180  char *p = (char *)stbi__malloc(initial_size);
4181  if (p == NULL) return NULL;
4182  a.zbuffer = (stbi_uc *)buffer;
4183  a.zbuffer_end = (stbi_uc *)buffer + len;
4184  if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4185  if (outlen) *outlen = (int)(a.zout - a.zout_start);
4186  return a.zout_start;
4187  }
4188  else {
4189  STBI_FREE(a.zout_start);
4190  return NULL;
4191  }
4192 }
4193 
4194 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
4195 {
4196  stbi__zbuf a;
4197  a.zbuffer = (stbi_uc *)ibuffer;
4198  a.zbuffer_end = (stbi_uc *)ibuffer + ilen;
4199  if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4200  return (int)(a.zout - a.zout_start);
4201  else
4202  return -1;
4203 }
4204 
4205 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
4206 {
4207  stbi__zbuf a;
4208  char *p = (char *)stbi__malloc(16384);
4209  if (p == NULL) return NULL;
4210  a.zbuffer = (stbi_uc *)buffer;
4211  a.zbuffer_end = (stbi_uc *)buffer + len;
4212  if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4213  if (outlen) *outlen = (int)(a.zout - a.zout_start);
4214  return a.zout_start;
4215  }
4216  else {
4217  STBI_FREE(a.zout_start);
4218  return NULL;
4219  }
4220 }
4221 
4222 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
4223 {
4224  stbi__zbuf a;
4225  a.zbuffer = (stbi_uc *)ibuffer;
4226  a.zbuffer_end = (stbi_uc *)ibuffer + ilen;
4227  if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4228  return (int)(a.zout - a.zout_start);
4229  else
4230  return -1;
4231 }
4232 #endif
4233 
4234 // public domain "baseline" PNG decoder v0.10 Sean Barrett 2006-11-18
4235 // simple implementation
4236 // - only 8-bit samples
4237 // - no CRC checking
4238 // - allocates lots of intermediate memory
4239 // - avoids problem of streaming data between subsystems
4240 // - avoids explicit window management
4241 // performance
4242 // - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4243 
4244 #ifndef STBI_NO_PNG
4245 typedef struct
4246 {
4247  stbi__uint32 length;
4248  stbi__uint32 type;
4249 } stbi__pngchunk;
4250 
4251 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
4252 {
4253  stbi__pngchunk c;
4254  c.length = stbi__get32be(s);
4255  c.type = stbi__get32be(s);
4256  return c;
4257 }
4258 
4259 static int stbi__check_png_header(stbi__context *s)
4260 {
4261  static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
4262  int i;
4263  for (i = 0; i < 8; ++i)
4264  if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig", "Not a PNG");
4265  return 1;
4266 }
4267 
4268 typedef struct
4269 {
4270  stbi__context *s;
4271  stbi_uc *idata, *expanded, *out;
4272  int depth;
4273 } stbi__png;
4274 
4275 
4276 enum {
4277  STBI__F_none = 0,
4278  STBI__F_sub = 1,
4279  STBI__F_up = 2,
4280  STBI__F_avg = 3,
4281  STBI__F_paeth = 4,
4282  // synthetic filters used for first scanline to avoid needing a dummy row of 0s
4283  STBI__F_avg_first,
4284  STBI__F_paeth_first
4285 };
4286 
4287 static stbi_uc first_row_filter[5] =
4288 {
4289  STBI__F_none,
4290  STBI__F_sub,
4291  STBI__F_none,
4292  STBI__F_avg_first,
4293  STBI__F_paeth_first
4294 };
4295 
4296 static int stbi__paeth(int a, int b, int c)
4297 {
4298  int p = a + b - c;
4299  int pa = abs(p - a);
4300  int pb = abs(p - b);
4301  int pc = abs(p - c);
4302  if (pa <= pb && pa <= pc) return a;
4303  if (pb <= pc) return b;
4304  return c;
4305 }
4306 
4307 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4308 
4309 // create the png data from post-deflated data
4310 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4311 {
4312  int bytes = (depth == 16 ? 2 : 1);
4313  stbi__context *s = a->s;
4314  stbi__uint32 i, j, stride = x*out_n*bytes;
4315  stbi__uint32 img_len, img_width_bytes;
4316  int k;
4317  int img_n = s->img_n; // copy it into a local for later
4318 
4319  int output_bytes = out_n*bytes;
4320  int filter_bytes = img_n*bytes;
4321  int width = x;
4322 
4323  STBI_ASSERT(out_n == s->img_n || out_n == s->img_n + 1);
4324  a->out = (stbi_uc *)stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
4325  if (!a->out) return stbi__err("outofmem", "Out of memory");
4326 
4327  img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4328  img_len = (img_width_bytes + 1) * y;
4329  if (s->img_x == x && s->img_y == y) {
4330  if (raw_len != img_len) return stbi__err("not enough pixels", "Corrupt PNG");
4331  }
4332  else { // interlaced:
4333  if (raw_len < img_len) return stbi__err("not enough pixels", "Corrupt PNG");
4334  }
4335 
4336  for (j = 0; j < y; ++j) {
4337  stbi_uc *cur = a->out + stride*j;
4338  stbi_uc *prior = cur - stride;
4339  int filter = *raw++;
4340 
4341  if (filter > 4)
4342  return stbi__err("invalid filter", "Corrupt PNG");
4343 
4344  if (depth < 8) {
4345  STBI_ASSERT(img_width_bytes <= x);
4346  cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4347  filter_bytes = 1;
4348  width = img_width_bytes;
4349  }
4350 
4351  // if first row, use special filter that doesn't sample previous row
4352  if (j == 0) filter = first_row_filter[filter];
4353 
4354  // handle first byte explicitly
4355  for (k = 0; k < filter_bytes; ++k) {
4356  switch (filter) {
4357  case STBI__F_none: cur[k] = raw[k]; break;
4358  case STBI__F_sub: cur[k] = raw[k]; break;
4359  case STBI__F_up: cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4360  case STBI__F_avg: cur[k] = STBI__BYTECAST(raw[k] + (prior[k] >> 1)); break;
4361  case STBI__F_paeth: cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0, prior[k], 0)); break;
4362  case STBI__F_avg_first: cur[k] = raw[k]; break;
4363  case STBI__F_paeth_first: cur[k] = raw[k]; break;
4364  }
4365  }
4366 
4367  if (depth == 8) {
4368  if (img_n != out_n)
4369  cur[img_n] = 255; // first pixel
4370  raw += img_n;
4371  cur += out_n;
4372  prior += out_n;
4373  }
4374  else if (depth == 16) {
4375  if (img_n != out_n) {
4376  cur[filter_bytes] = 255; // first pixel top byte
4377  cur[filter_bytes + 1] = 255; // first pixel bottom byte
4378  }
4379  raw += filter_bytes;
4380  cur += output_bytes;
4381  prior += output_bytes;
4382  }
4383  else {
4384  raw += 1;
4385  cur += 1;
4386  prior += 1;
4387  }
4388 
4389  // this is a little gross, so that we don't switch per-pixel or per-component
4390  if (depth < 8 || img_n == out_n) {
4391  int nk = (width - 1)*filter_bytes;
4392 #define STBI__CASE(f) \
4393  case f: \
4394  for (k=0; k < nk; ++k)
4395  switch (filter) {
4396  // "none" filter turns into a memcpy here; make that explicit.
4397  case STBI__F_none: memcpy(cur, raw, nk); break;
4398  STBI__CASE(STBI__F_sub) { cur[k] = STBI__BYTECAST(raw[k] + cur[k - filter_bytes]); } break;
4399  STBI__CASE(STBI__F_up) { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4400  STBI__CASE(STBI__F_avg) { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k - filter_bytes]) >> 1)); } break;
4401  STBI__CASE(STBI__F_paeth) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k - filter_bytes], prior[k], prior[k - filter_bytes])); } break;
4402  STBI__CASE(STBI__F_avg_first) { cur[k] = STBI__BYTECAST(raw[k] + (cur[k - filter_bytes] >> 1)); } break;
4403  STBI__CASE(STBI__F_paeth_first) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k - filter_bytes], 0, 0)); } break;
4404  }
4405 #undef STBI__CASE
4406  raw += nk;
4407  }
4408  else {
4409  STBI_ASSERT(img_n + 1 == out_n);
4410 #define STBI__CASE(f) \
4411  case f: \
4412  for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
4413  for (k=0; k < filter_bytes; ++k)
4414  switch (filter) {
4415  STBI__CASE(STBI__F_none) { cur[k] = raw[k]; } break;
4416  STBI__CASE(STBI__F_sub) { cur[k] = STBI__BYTECAST(raw[k] + cur[k - output_bytes]); } break;
4417  STBI__CASE(STBI__F_up) { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4418  STBI__CASE(STBI__F_avg) { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k - output_bytes]) >> 1)); } break;
4419  STBI__CASE(STBI__F_paeth) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k - output_bytes], prior[k], prior[k - output_bytes])); } break;
4420  STBI__CASE(STBI__F_avg_first) { cur[k] = STBI__BYTECAST(raw[k] + (cur[k - output_bytes] >> 1)); } break;
4421  STBI__CASE(STBI__F_paeth_first) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k - output_bytes], 0, 0)); } break;
4422  }
4423 #undef STBI__CASE
4424 
4425  // the loop above sets the high byte of the pixels' alpha, but for
4426  // 16 bit png files we also need the low byte set. we'll do that here.
4427  if (depth == 16) {
4428  cur = a->out + stride*j; // start at the beginning of the row again
4429  for (i = 0; i < x; ++i, cur += output_bytes) {
4430  cur[filter_bytes + 1] = 255;
4431  }
4432  }
4433  }
4434  }
4435 
4436  // we make a separate pass to expand bits to pixels; for performance,
4437  // this could run two scanlines behind the above code, so it won't
4438  // intefere with filtering but will still be in the cache.
4439  if (depth < 8) {
4440  for (j = 0; j < y; ++j) {
4441  stbi_uc *cur = a->out + stride*j;
4442  stbi_uc *in = a->out + stride*j + x*out_n - img_width_bytes;
4443  // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4444  // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4445  stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4446 
4447  // note that the final byte might overshoot and write more data than desired.
4448  // we can allocate enough data that this never writes out of memory, but it
4449  // could also overwrite the next scanline. can it overwrite non-empty data
4450  // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4451  // so we need to explicitly clamp the final ones
4452 
4453  if (depth == 4) {
4454  for (k = x*img_n; k >= 2; k -= 2, ++in) {
4455  *cur++ = scale * ((*in >> 4));
4456  *cur++ = scale * ((*in) & 0x0f);
4457  }
4458  if (k > 0) *cur++ = scale * ((*in >> 4));
4459  }
4460  else if (depth == 2) {
4461  for (k = x*img_n; k >= 4; k -= 4, ++in) {
4462  *cur++ = scale * ((*in >> 6));
4463  *cur++ = scale * ((*in >> 4) & 0x03);
4464  *cur++ = scale * ((*in >> 2) & 0x03);
4465  *cur++ = scale * ((*in) & 0x03);
4466  }
4467  if (k > 0) *cur++ = scale * ((*in >> 6));
4468  if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4469  if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4470  }
4471  else if (depth == 1) {
4472  for (k = x*img_n; k >= 8; k -= 8, ++in) {
4473  *cur++ = scale * ((*in >> 7));
4474  *cur++ = scale * ((*in >> 6) & 0x01);
4475  *cur++ = scale * ((*in >> 5) & 0x01);
4476  *cur++ = scale * ((*in >> 4) & 0x01);
4477  *cur++ = scale * ((*in >> 3) & 0x01);
4478  *cur++ = scale * ((*in >> 2) & 0x01);
4479  *cur++ = scale * ((*in >> 1) & 0x01);
4480  *cur++ = scale * ((*in) & 0x01);
4481  }
4482  if (k > 0) *cur++ = scale * ((*in >> 7));
4483  if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4484  if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4485  if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4486  if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4487  if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4488  if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4489  }
4490  if (img_n != out_n) {
4491  int q;
4492  // insert alpha = 255
4493  cur = a->out + stride*j;
4494  if (img_n == 1) {
4495  for (q = x - 1; q >= 0; --q) {
4496  cur[q * 2 + 1] = 255;
4497  cur[q * 2 + 0] = cur[q];
4498  }
4499  }
4500  else {
4501  STBI_ASSERT(img_n == 3);
4502  for (q = x - 1; q >= 0; --q) {
4503  cur[q * 4 + 3] = 255;
4504  cur[q * 4 + 2] = cur[q * 3 + 2];
4505  cur[q * 4 + 1] = cur[q * 3 + 1];
4506  cur[q * 4 + 0] = cur[q * 3 + 0];
4507  }
4508  }
4509  }
4510  }
4511  }
4512  else if (depth == 16) {
4513  // force the image data from big-endian to platform-native.
4514  // this is done in a separate pass due to the decoding relying
4515  // on the data being untouched, but could probably be done
4516  // per-line during decode if care is taken.
4517  stbi_uc *cur = a->out;
4518  stbi__uint16 *cur16 = (stbi__uint16*)cur;
4519 
4520  for (i = 0; i < x*y*out_n; ++i, cur16++, cur += 2) {
4521  *cur16 = (cur[0] << 8) | cur[1];
4522  }
4523  }
4524 
4525  return 1;
4526 }
4527 
4528 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4529 {
4530  int bytes = (depth == 16 ? 2 : 1);
4531  int out_bytes = out_n * bytes;
4532  stbi_uc *final;
4533  int p;
4534  if (!interlaced)
4535  return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4536 
4537  // de-interlacing
4538  final = (stbi_uc *)stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
4539  for (p = 0; p < 7; ++p) {
4540  int xorig[] = { 0,4,0,2,0,1,0 };
4541  int yorig[] = { 0,0,4,0,2,0,1 };
4542  int xspc[] = { 8,8,4,4,2,2,1 };
4543  int yspc[] = { 8,8,8,4,4,2,2 };
4544  int i, j, x, y;
4545  // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4546  x = (a->s->img_x - xorig[p] + xspc[p] - 1) / xspc[p];
4547  y = (a->s->img_y - yorig[p] + yspc[p] - 1) / yspc[p];
4548  if (x && y) {
4549  stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4550  if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4551  STBI_FREE(final);
4552  return 0;
4553  }
4554  for (j = 0; j < y; ++j) {
4555  for (i = 0; i < x; ++i) {
4556  int out_y = j*yspc[p] + yorig[p];
4557  int out_x = i*xspc[p] + xorig[p];
4558  memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
4559  a->out + (j*x + i)*out_bytes, out_bytes);
4560  }
4561  }
4562  STBI_FREE(a->out);
4563  image_data += img_len;
4564  image_data_len -= img_len;
4565  }
4566  }
4567  a->out = final;
4568 
4569  return 1;
4570 }
4571 
4572 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4573 {
4574  stbi__context *s = z->s;
4575  stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4576  stbi_uc *p = z->out;
4577 
4578  // compute color-based transparency, assuming we've
4579  // already got 255 as the alpha value in the output
4580  STBI_ASSERT(out_n == 2 || out_n == 4);
4581 
4582  if (out_n == 2) {
4583  for (i = 0; i < pixel_count; ++i) {
4584  p[1] = (p[0] == tc[0] ? 0 : 255);
4585  p += 2;
4586  }
4587  }
4588  else {
4589  for (i = 0; i < pixel_count; ++i) {
4590  if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4591  p[3] = 0;
4592  p += 4;
4593  }
4594  }
4595  return 1;
4596 }
4597 
4598 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4599 {
4600  stbi__context *s = z->s;
4601  stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4602  stbi__uint16 *p = (stbi__uint16*)z->out;
4603 
4604  // compute color-based transparency, assuming we've
4605  // already got 65535 as the alpha value in the output
4606  STBI_ASSERT(out_n == 2 || out_n == 4);
4607 
4608  if (out_n == 2) {
4609  for (i = 0; i < pixel_count; ++i) {
4610  p[1] = (p[0] == tc[0] ? 0 : 65535);
4611  p += 2;
4612  }
4613  }
4614  else {
4615  for (i = 0; i < pixel_count; ++i) {
4616  if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4617  p[3] = 0;
4618  p += 4;
4619  }
4620  }
4621  return 1;
4622 }
4623 
4624 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4625 {
4626  stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4627  stbi_uc *p, *temp_out, *orig = a->out;
4628 
4629  p = (stbi_uc *)stbi__malloc_mad2(pixel_count, pal_img_n, 0);
4630  if (p == NULL) return stbi__err("outofmem", "Out of memory");
4631 
4632  // between here and free(out) below, exitting would leak
4633  temp_out = p;
4634 
4635  if (pal_img_n == 3) {
4636  for (i = 0; i < pixel_count; ++i) {
4637  int n = orig[i] * 4;
4638  p[0] = palette[n];
4639  p[1] = palette[n + 1];
4640  p[2] = palette[n + 2];
4641  p += 3;
4642  }
4643  }
4644  else {
4645  for (i = 0; i < pixel_count; ++i) {
4646  int n = orig[i] * 4;
4647  p[0] = palette[n];
4648  p[1] = palette[n + 1];
4649  p[2] = palette[n + 2];
4650  p[3] = palette[n + 3];
4651  p += 4;
4652  }
4653  }
4654  STBI_FREE(a->out);
4655  a->out = temp_out;
4656 
4657  STBI_NOTUSED(len);
4658 
4659  return 1;
4660 }
4661 
4662 static int stbi__unpremultiply_on_load = 0;
4663 static int stbi__de_iphone_flag = 0;
4664 
4665 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4666 {
4667  stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4668 }
4669 
4670 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4671 {
4672  stbi__de_iphone_flag = flag_true_if_should_convert;
4673 }
4674 
4675 static void stbi__de_iphone(stbi__png *z)
4676 {
4677  stbi__context *s = z->s;
4678  stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4679  stbi_uc *p = z->out;
4680 
4681  if (s->img_out_n == 3) { // convert bgr to rgb
4682  for (i = 0; i < pixel_count; ++i) {
4683  stbi_uc t = p[0];
4684  p[0] = p[2];
4685  p[2] = t;
4686  p += 3;
4687  }
4688  }
4689  else {
4690  STBI_ASSERT(s->img_out_n == 4);
4691  if (stbi__unpremultiply_on_load) {
4692  // convert bgr to rgb and unpremultiply
4693  for (i = 0; i < pixel_count; ++i) {
4694  stbi_uc a = p[3];
4695  stbi_uc t = p[0];
4696  if (a) {
4697  p[0] = p[2] * 255 / a;
4698  p[1] = p[1] * 255 / a;
4699  p[2] = t * 255 / a;
4700  }
4701  else {
4702  p[0] = p[2];
4703  p[2] = t;
4704  }
4705  p += 4;
4706  }
4707  }
4708  else {
4709  // convert bgr to rgb
4710  for (i = 0; i < pixel_count; ++i) {
4711  stbi_uc t = p[0];
4712  p[0] = p[2];
4713  p[2] = t;
4714  p += 4;
4715  }
4716  }
4717  }
4718 }
4719 
4720 #define STBI__PNG_TYPE(a,b,c,d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4721 
4722 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4723 {
4724  stbi_uc palette[1024], pal_img_n = 0;
4725  stbi_uc has_trans = 0, tc[3];
4726  stbi__uint16 tc16[3];
4727  stbi__uint32 ioff = 0, idata_limit = 0, i, pal_len = 0;
4728  int first = 1, k, interlace = 0, color = 0, is_iphone = 0;
4729  stbi__context *s = z->s;
4730 
4731  z->expanded = NULL;
4732  z->idata = NULL;
4733  z->out = NULL;
4734 
4735  if (!stbi__check_png_header(s)) return 0;
4736 
4737  if (scan == STBI__SCAN_type) return 1;
4738 
4739  for (;;) {
4740  stbi__pngchunk c = stbi__get_chunk_header(s);
4741  switch (c.type) {
4742  case STBI__PNG_TYPE('C', 'g', 'B', 'I'):
4743  is_iphone = 1;
4744  stbi__skip(s, c.length);
4745  break;
4746  case STBI__PNG_TYPE('I', 'H', 'D', 'R'): {
4747  int comp, filter;
4748  if (!first) return stbi__err("multiple IHDR", "Corrupt PNG");
4749  first = 0;
4750  if (c.length != 13) return stbi__err("bad IHDR len", "Corrupt PNG");
4751  s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large", "Very large image (corrupt?)");
4752  s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large", "Very large image (corrupt?)");
4753  z->depth = stbi__get8(s); if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16) return stbi__err("1/2/4/8/16-bit only", "PNG not supported: 1/2/4/8/16-bit only");
4754  color = stbi__get8(s); if (color > 6) return stbi__err("bad ctype", "Corrupt PNG");
4755  if (color == 3 && z->depth == 16) return stbi__err("bad ctype", "Corrupt PNG");
4756  if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype", "Corrupt PNG");
4757  comp = stbi__get8(s); if (comp) return stbi__err("bad comp method", "Corrupt PNG");
4758  filter = stbi__get8(s); if (filter) return stbi__err("bad filter method", "Corrupt PNG");
4759  interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method", "Corrupt PNG");
4760  if (!s->img_x || !s->img_y) return stbi__err("0-pixel image", "Corrupt PNG");
4761  if (!pal_img_n) {
4762  s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4763  if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4764  if (scan == STBI__SCAN_header) return 1;
4765  }
4766  else {
4767  // if paletted, then pal_n is our final components, and
4768  // img_n is # components to decompress/filter.
4769  s->img_n = 1;
4770  if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large", "Corrupt PNG");
4771  // if SCAN_header, have to scan to see if we have a tRNS
4772  }
4773  break;
4774  }
4775 
4776  case STBI__PNG_TYPE('P', 'L', 'T', 'E'): {
4777  if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4778  if (c.length > 256 * 3) return stbi__err("invalid PLTE", "Corrupt PNG");
4779  pal_len = c.length / 3;
4780  if (pal_len * 3 != c.length) return stbi__err("invalid PLTE", "Corrupt PNG");
4781  for (i = 0; i < pal_len; ++i) {
4782  palette[i * 4 + 0] = stbi__get8(s);
4783  palette[i * 4 + 1] = stbi__get8(s);
4784  palette[i * 4 + 2] = stbi__get8(s);
4785  palette[i * 4 + 3] = 255;
4786  }
4787  break;
4788  }
4789 
4790  case STBI__PNG_TYPE('t', 'R', 'N', 'S'): {
4791  if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4792  if (z->idata) return stbi__err("tRNS after IDAT", "Corrupt PNG");
4793  if (pal_img_n) {
4794  if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4795  if (pal_len == 0) return stbi__err("tRNS before PLTE", "Corrupt PNG");
4796  if (c.length > pal_len) return stbi__err("bad tRNS len", "Corrupt PNG");
4797  pal_img_n = 4;
4798  for (i = 0; i < c.length; ++i)
4799  palette[i * 4 + 3] = stbi__get8(s);
4800  }
4801  else {
4802  if (!(s->img_n & 1)) return stbi__err("tRNS with alpha", "Corrupt PNG");
4803  if (c.length != (stbi__uint32)s->img_n * 2) return stbi__err("bad tRNS len", "Corrupt PNG");
4804  has_trans = 1;
4805  if (z->depth == 16) {
4806  for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
4807  }
4808  else {
4809  for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
4810  }
4811  }
4812  break;
4813  }
4814 
4815  case STBI__PNG_TYPE('I', 'D', 'A', 'T'): {
4816  if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4817  if (pal_img_n && !pal_len) return stbi__err("no PLTE", "Corrupt PNG");
4818  if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4819  if ((int)(ioff + c.length) < (int)ioff) return 0;
4820  if (ioff + c.length > idata_limit) {
4821  stbi__uint32 idata_limit_old = idata_limit;
4822  stbi_uc *p;
4823  if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4824  while (ioff + c.length > idata_limit)
4825  idata_limit *= 2;
4826  STBI_NOTUSED(idata_limit_old);
4827  p = (stbi_uc *)STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4828  z->idata = p;
4829  }
4830  if (!stbi__getn(s, z->idata + ioff, c.length)) return stbi__err("outofdata", "Corrupt PNG");
4831  ioff += c.length;
4832  break;
4833  }
4834 
4835  case STBI__PNG_TYPE('I', 'E', 'N', 'D'): {
4836  stbi__uint32 raw_len, bpl;
4837  if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4838  if (scan != STBI__SCAN_load) return 1;
4839  if (z->idata == NULL) return stbi__err("no IDAT", "Corrupt PNG");
4840  // initial guess for decoded data size to avoid unnecessary reallocs
4841  bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
4842  raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4843  z->expanded = (stbi_uc *)stbi_zlib_decode_malloc_guesssize_headerflag((char *)z->idata, ioff, raw_len, (int *)&raw_len, !is_iphone);
4844  if (z->expanded == NULL) return 0; // zlib should set error
4845  STBI_FREE(z->idata); z->idata = NULL;
4846  if ((req_comp == s->img_n + 1 && req_comp != 3 && !pal_img_n) || has_trans)
4847  s->img_out_n = s->img_n + 1;
4848  else
4849  s->img_out_n = s->img_n;
4850  if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
4851  if (has_trans) {
4852  if (z->depth == 16) {
4853  if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
4854  }
4855  else {
4856  if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4857  }
4858  }
4859  if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4860  stbi__de_iphone(z);
4861  if (pal_img_n) {
4862  // pal_img_n == 3 or 4
4863  s->img_n = pal_img_n; // record the actual colors we had
4864  s->img_out_n = pal_img_n;
4865  if (req_comp >= 3) s->img_out_n = req_comp;
4866  if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4867  return 0;
4868  }
4869  STBI_FREE(z->expanded); z->expanded = NULL;
4870  return 1;
4871  }
4872 
4873  default:
4874  // if critical, fail
4875  if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4876  if ((c.type & (1 << 29)) == 0) {
4877 #ifndef STBI_NO_FAILURE_STRINGS
4878  // not threadsafe
4879  static char invalid_chunk[] = "XXXX PNG chunk not known";
4880  invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4881  invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4882  invalid_chunk[2] = STBI__BYTECAST(c.type >> 8);
4883  invalid_chunk[3] = STBI__BYTECAST(c.type >> 0);
4884 #endif
4885  return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4886  }
4887  stbi__skip(s, c.length);
4888  break;
4889  }
4890  // end of PNG chunk, read and skip CRC
4891  stbi__get32be(s);
4892  }
4893 }
4894 
4895 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
4896 {
4897  void *result = NULL;
4898  if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4899  if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4900  ri->bits_per_channel = p->depth;
4901  result = p->out;
4902  p->out = NULL;
4903  if (req_comp && req_comp != p->s->img_out_n) {
4904  if (ri->bits_per_channel == 8)
4905  result = stbi__convert_format((unsigned char *)result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4906  else
4907  result = stbi__convert_format16((stbi__uint16 *)result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4908  p->s->img_out_n = req_comp;
4909  if (result == NULL) return result;
4910  }
4911  *x = p->s->img_x;
4912  *y = p->s->img_y;
4913  if (n) *n = p->s->img_n;
4914  }
4915  STBI_FREE(p->out); p->out = NULL;
4916  STBI_FREE(p->expanded); p->expanded = NULL;
4917  STBI_FREE(p->idata); p->idata = NULL;
4918 
4919  return result;
4920 }
4921 
4922 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
4923 {
4924  stbi__png p;
4925  p.s = s;
4926  return stbi__do_png(&p, x, y, comp, req_comp, ri);
4927 }
4928 
4929 static int stbi__png_test(stbi__context *s)
4930 {
4931  int r;
4932  r = stbi__check_png_header(s);
4933  stbi__rewind(s);
4934  return r;
4935 }
4936 
4937 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4938 {
4939  if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4940  stbi__rewind(p->s);
4941  return 0;
4942  }
4943  if (x) *x = p->s->img_x;
4944  if (y) *y = p->s->img_y;
4945  if (comp) *comp = p->s->img_n;
4946  return 1;
4947 }
4948 
4949 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4950 {
4951  stbi__png p;
4952  p.s = s;
4953  return stbi__png_info_raw(&p, x, y, comp);
4954 }
4955 #endif
4956 
4957 // Microsoft/Windows BMP image
4958 
4959 #ifndef STBI_NO_BMP
4960 static int stbi__bmp_test_raw(stbi__context *s)
4961 {
4962  int r;
4963  int sz;
4964  if (stbi__get8(s) != 'B') return 0;
4965  if (stbi__get8(s) != 'M') return 0;
4966  stbi__get32le(s); // discard filesize
4967  stbi__get16le(s); // discard reserved
4968  stbi__get16le(s); // discard reserved
4969  stbi__get32le(s); // discard data offset
4970  sz = stbi__get32le(s);
4971  r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4972  return r;
4973 }
4974 
4975 static int stbi__bmp_test(stbi__context *s)
4976 {
4977  int r = stbi__bmp_test_raw(s);
4978  stbi__rewind(s);
4979  return r;
4980 }
4981 
4982 
4983 // returns 0..31 for the highest set bit
4984 static int stbi__high_bit(unsigned int z)
4985 {
4986  int n = 0;
4987  if (z == 0) return -1;
4988  if (z >= 0x10000) n += 16, z >>= 16;
4989  if (z >= 0x00100) n += 8, z >>= 8;
4990  if (z >= 0x00010) n += 4, z >>= 4;
4991  if (z >= 0x00004) n += 2, z >>= 2;
4992  if (z >= 0x00002) n += 1, z >>= 1;
4993  return n;
4994 }
4995 
4996 static int stbi__bitcount(unsigned int a)
4997 {
4998  a = (a & 0x55555555) + ((a >> 1) & 0x55555555); // max 2
4999  a = (a & 0x33333333) + ((a >> 2) & 0x33333333); // max 4
5000  a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
5001  a = (a + (a >> 8)); // max 16 per 8 bits
5002  a = (a + (a >> 16)); // max 32 per 8 bits
5003  return a & 0xff;
5004 }
5005 
5006 static int stbi__shiftsigned(int v, int shift, int bits)
5007 {
5008  int result;
5009  int z = 0;
5010 
5011  if (shift < 0) v <<= -shift;
5012  else v >>= shift;
5013  result = v;
5014 
5015  z = bits;
5016  while (z < 8) {
5017  result += v >> z;
5018  z += bits;
5019  }
5020  return result;
5021 }
5022 
5023 typedef struct
5024 {
5025  int bpp, offset, hsz;
5026  unsigned int mr, mg, mb, ma, all_a;
5027 } stbi__bmp_data;
5028 
5029 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
5030 {
5031  int hsz;
5032  if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
5033  stbi__get32le(s); // discard filesize
5034  stbi__get16le(s); // discard reserved
5035  stbi__get16le(s); // discard reserved
5036  info->offset = stbi__get32le(s);
5037  info->hsz = hsz = stbi__get32le(s);
5038  info->mr = info->mg = info->mb = info->ma = 0;
5039 
5040  if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
5041  if (hsz == 12) {
5042  s->img_x = stbi__get16le(s);
5043  s->img_y = stbi__get16le(s);
5044  }
5045  else {
5046  s->img_x = stbi__get32le(s);
5047  s->img_y = stbi__get32le(s);
5048  }
5049  if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
5050  info->bpp = stbi__get16le(s);
5051  if (info->bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
5052  if (hsz != 12) {
5053  int compress = stbi__get32le(s);
5054  if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5055  stbi__get32le(s); // discard sizeof
5056  stbi__get32le(s); // discard hres
5057  stbi__get32le(s); // discard vres
5058  stbi__get32le(s); // discard colorsused
5059  stbi__get32le(s); // discard max important
5060  if (hsz == 40 || hsz == 56) {
5061  if (hsz == 56) {
5062  stbi__get32le(s);
5063  stbi__get32le(s);
5064  stbi__get32le(s);
5065  stbi__get32le(s);
5066  }
5067  if (info->bpp == 16 || info->bpp == 32) {
5068  if (compress == 0) {
5069  if (info->bpp == 32) {
5070  info->mr = 0xffu << 16;
5071  info->mg = 0xffu << 8;
5072  info->mb = 0xffu << 0;
5073  info->ma = 0xffu << 24;
5074  info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5075  }
5076  else {
5077  info->mr = 31u << 10;
5078  info->mg = 31u << 5;
5079  info->mb = 31u << 0;
5080  }
5081  }
5082  else if (compress == 3) {
5083  info->mr = stbi__get32le(s);
5084  info->mg = stbi__get32le(s);
5085  info->mb = stbi__get32le(s);
5086  // not documented, but generated by photoshop and handled by mspaint
5087  if (info->mr == info->mg && info->mg == info->mb) {
5088  // ?!?!?
5089  return stbi__errpuc("bad BMP", "bad BMP");
5090  }
5091  }
5092  else
5093  return stbi__errpuc("bad BMP", "bad BMP");
5094  }
5095  }
5096  else {
5097  int i;
5098  if (hsz != 108 && hsz != 124)
5099  return stbi__errpuc("bad BMP", "bad BMP");
5100  info->mr = stbi__get32le(s);
5101  info->mg = stbi__get32le(s);
5102  info->mb = stbi__get32le(s);
5103  info->ma = stbi__get32le(s);
5104  stbi__get32le(s); // discard color space
5105  for (i = 0; i < 12; ++i)
5106  stbi__get32le(s); // discard color space parameters
5107  if (hsz == 124) {
5108  stbi__get32le(s); // discard rendering intent
5109  stbi__get32le(s); // discard offset of profile data
5110  stbi__get32le(s); // discard size of profile data
5111  stbi__get32le(s); // discard reserved
5112  }
5113  }
5114  }
5115  return (void *)1;
5116 }
5117 
5118 
5119 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5120 {
5121  stbi_uc *out;
5122  unsigned int mr = 0, mg = 0, mb = 0, ma = 0, all_a;
5123  stbi_uc pal[256][4];
5124  int psize = 0, i, j, width;
5125  int flip_vertically, pad, target;
5126  stbi__bmp_data info;
5127  STBI_NOTUSED(ri);
5128 
5129  info.all_a = 255;
5130  if (stbi__bmp_parse_header(s, &info) == NULL)
5131  return NULL; // error code already set
5132 
5133  flip_vertically = ((int)s->img_y) > 0;
5134  s->img_y = abs((int)s->img_y);
5135 
5136  mr = info.mr;
5137  mg = info.mg;
5138  mb = info.mb;
5139  ma = info.ma;
5140  all_a = info.all_a;
5141 
5142  if (info.hsz == 12) {
5143  if (info.bpp < 24)
5144  psize = (info.offset - 14 - 24) / 3;
5145  }
5146  else {
5147  if (info.bpp < 16)
5148  psize = (info.offset - 14 - info.hsz) >> 2;
5149  }
5150 
5151  s->img_n = ma ? 4 : 3;
5152  if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5153  target = req_comp;
5154  else
5155  target = s->img_n; // if they want monochrome, we'll post-convert
5156 
5157  // sanity-check size
5158  if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
5159  return stbi__errpuc("too large", "Corrupt BMP");
5160 
5161  out = (stbi_uc *)stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
5162  if (!out) return stbi__errpuc("outofmem", "Out of memory");
5163  if (info.bpp < 16) {
5164  int z = 0;
5165  if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
5166  for (i = 0; i < psize; ++i) {
5167  pal[i][2] = stbi__get8(s);
5168  pal[i][1] = stbi__get8(s);
5169  pal[i][0] = stbi__get8(s);
5170  if (info.hsz != 12) stbi__get8(s);
5171  pal[i][3] = 255;
5172  }
5173  stbi__skip(s, info.offset - 14 - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5174  if (info.bpp == 4) width = (s->img_x + 1) >> 1;
5175  else if (info.bpp == 8) width = s->img_x;
5176  else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
5177  pad = (-width) & 3;
5178  for (j = 0; j < (int)s->img_y; ++j) {
5179  for (i = 0; i < (int)s->img_x; i += 2) {
5180  int v = stbi__get8(s), v2 = 0;
5181  if (info.bpp == 4) {
5182  v2 = v & 15;
5183  v >>= 4;
5184  }
5185  out[z++] = pal[v][0];
5186  out[z++] = pal[v][1];
5187  out[z++] = pal[v][2];
5188  if (target == 4) out[z++] = 255;
5189  if (i + 1 == (int)s->img_x) break;
5190  v = (info.bpp == 8) ? stbi__get8(s) : v2;
5191  out[z++] = pal[v][0];
5192  out[z++] = pal[v][1];
5193  out[z++] = pal[v][2];
5194  if (target == 4) out[z++] = 255;
5195  }
5196  stbi__skip(s, pad);
5197  }
5198  }
5199  else {
5200  int rshift = 0, gshift = 0, bshift = 0, ashift = 0, rcount = 0, gcount = 0, bcount = 0, acount = 0;
5201  int z = 0;
5202  int easy = 0;
5203  stbi__skip(s, info.offset - 14 - info.hsz);
5204  if (info.bpp == 24) width = 3 * s->img_x;
5205  else if (info.bpp == 16) width = 2 * s->img_x;
5206  else /* bpp = 32 and pad = 0 */ width = 0;
5207  pad = (-width) & 3;
5208  if (info.bpp == 24) {
5209  easy = 1;
5210  }
5211  else if (info.bpp == 32) {
5212  if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5213  easy = 2;
5214  }
5215  if (!easy) {
5216  if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5217  // right shift amt to put high bit in position #7
5218  rshift = stbi__high_bit(mr) - 7; rcount = stbi__bitcount(mr);
5219  gshift = stbi__high_bit(mg) - 7; gcount = stbi__bitcount(mg);
5220  bshift = stbi__high_bit(mb) - 7; bcount = stbi__bitcount(mb);
5221  ashift = stbi__high_bit(ma) - 7; acount = stbi__bitcount(ma);
5222  }
5223  for (j = 0; j < (int)s->img_y; ++j) {
5224  if (easy) {
5225  for (i = 0; i < (int)s->img_x; ++i) {
5226  unsigned char a;
5227  out[z + 2] = stbi__get8(s);
5228  out[z + 1] = stbi__get8(s);
5229  out[z + 0] = stbi__get8(s);
5230  z += 3;
5231  a = (easy == 2 ? stbi__get8(s) : 255);
5232  all_a |= a;
5233  if (target == 4) out[z++] = a;
5234  }
5235  }
5236  else {
5237  int bpp = info.bpp;
5238  for (i = 0; i < (int)s->img_x; ++i) {
5239  stbi__uint32 v = (bpp == 16 ? (stbi__uint32)stbi__get16le(s) : stbi__get32le(s));
5240  int a;
5241  out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5242  out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5243  out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5244  a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5245  all_a |= a;
5246  if (target == 4) out[z++] = STBI__BYTECAST(a);
5247  }
5248  }
5249  stbi__skip(s, pad);
5250  }
5251  }
5252 
5253  // if alpha channel is all 0s, replace with all 255s
5254  if (target == 4 && all_a == 0)
5255  for (i = 4 * s->img_x*s->img_y - 1; i >= 0; i -= 4)
5256  out[i] = 255;
5257 
5258  if (flip_vertically) {
5259  stbi_uc t;
5260  for (j = 0; j < (int)s->img_y >> 1; ++j) {
5261  stbi_uc *p1 = out + j *s->img_x*target;
5262  stbi_uc *p2 = out + (s->img_y - 1 - j)*s->img_x*target;
5263  for (i = 0; i < (int)s->img_x*target; ++i) {
5264  t = p1[i], p1[i] = p2[i], p2[i] = t;
5265  }
5266  }
5267  }
5268 
5269  if (req_comp && req_comp != target) {
5270  out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5271  if (out == NULL) return out; // stbi__convert_format frees input on failure
5272  }
5273 
5274  *x = s->img_x;
5275  *y = s->img_y;
5276  if (comp) *comp = s->img_n;
5277  return out;
5278 }
5279 #endif
5280 
5281 // Targa Truevision - TGA
5282 // by Jonathan Dummer
5283 #ifndef STBI_NO_TGA
5284 // returns STBI_rgb or whatever, 0 on error
5285 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
5286 {
5287  // only RGB or RGBA (incl. 16bit) or grey allowed
5288  if (is_rgb16) *is_rgb16 = 0;
5289  switch (bits_per_pixel) {
5290  case 8: return STBI_grey;
5291  case 16: if (is_grey) return STBI_grey_alpha;
5292  // else: fall-through
5293  case 15: if (is_rgb16) *is_rgb16 = 1;
5294  return STBI_rgb;
5295  case 24: // fall-through
5296  case 32: return bits_per_pixel / 8;
5297  default: return 0;
5298  }
5299 }
5300 
5301 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
5302 {
5303  int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5304  int sz, tga_colormap_type;
5305  stbi__get8(s); // discard Offset
5306  tga_colormap_type = stbi__get8(s); // colormap type
5307  if (tga_colormap_type > 1) {
5308  stbi__rewind(s);
5309  return 0; // only RGB or indexed allowed
5310  }
5311  tga_image_type = stbi__get8(s); // image type
5312  if (tga_colormap_type == 1) { // colormapped (paletted) image
5313  if (tga_image_type != 1 && tga_image_type != 9) {
5314  stbi__rewind(s);
5315  return 0;
5316  }
5317  stbi__skip(s, 4); // skip index of first colormap entry and number of entries
5318  sz = stbi__get8(s); // check bits per palette color entry
5319  if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32)) {
5320  stbi__rewind(s);
5321  return 0;
5322  }
5323  stbi__skip(s, 4); // skip image x and y origin
5324  tga_colormap_bpp = sz;
5325  }
5326  else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5327  if ((tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11)) {
5328  stbi__rewind(s);
5329  return 0; // only RGB or grey allowed, +/- RLE
5330  }
5331  stbi__skip(s, 9); // skip colormap specification and image x/y origin
5332  tga_colormap_bpp = 0;
5333  }
5334  tga_w = stbi__get16le(s);
5335  if (tga_w < 1) {
5336  stbi__rewind(s);
5337  return 0; // test width
5338  }
5339  tga_h = stbi__get16le(s);
5340  if (tga_h < 1) {
5341  stbi__rewind(s);
5342  return 0; // test height
5343  }
5344  tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5345  stbi__get8(s); // ignore alpha bits
5346  if (tga_colormap_bpp != 0) {
5347  if ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5348  // when using a colormap, tga_bits_per_pixel is the size of the indexes
5349  // I don't think anything but 8 or 16bit indexes makes sense
5350  stbi__rewind(s);
5351  return 0;
5352  }
5353  tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5354  }
5355  else {
5356  tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5357  }
5358  if (!tga_comp) {
5359  stbi__rewind(s);
5360  return 0;
5361  }
5362  if (x) *x = tga_w;
5363  if (y) *y = tga_h;
5364  if (comp) *comp = tga_comp;
5365  return 1; // seems to have passed everything
5366 }
5367 
5368 static int stbi__tga_test(stbi__context *s)
5369 {
5370  int res = 0;
5371  int sz, tga_color_type;
5372  stbi__get8(s); // discard Offset
5373  tga_color_type = stbi__get8(s); // color type
5374  if (tga_color_type > 1) goto errorEnd; // only RGB or indexed allowed
5375  sz = stbi__get8(s); // image type
5376  if (tga_color_type == 1) { // colormapped (paletted) image
5377  if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5378  stbi__skip(s, 4); // skip index of first colormap entry and number of entries
5379  sz = stbi__get8(s); // check bits per palette color entry
5380  if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32)) goto errorEnd;
5381  stbi__skip(s, 4); // skip image x and y origin
5382  }
5383  else { // "normal" image w/o colormap
5384  if ((sz != 2) && (sz != 3) && (sz != 10) && (sz != 11)) goto errorEnd; // only RGB or grey allowed, +/- RLE
5385  stbi__skip(s, 9); // skip colormap specification and image x/y origin
5386  }
5387  if (stbi__get16le(s) < 1) goto errorEnd; // test width
5388  if (stbi__get16le(s) < 1) goto errorEnd; // test height
5389  sz = stbi__get8(s); // bits per pixel
5390  if ((tga_color_type == 1) && (sz != 8) && (sz != 16)) goto errorEnd; // for colormapped images, bpp is size of an index
5391  if ((sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32)) goto errorEnd;
5392 
5393  res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5394 
5395 errorEnd:
5396  stbi__rewind(s);
5397  return res;
5398 }
5399 
5400 // read 16bit value and convert to 24bit RGB
5401 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5402 {
5403  stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
5404  stbi__uint16 fiveBitMask = 31;
5405  // we have 3 channels with 5bits each
5406  int r = (px >> 10) & fiveBitMask;
5407  int g = (px >> 5) & fiveBitMask;
5408  int b = px & fiveBitMask;
5409  // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5410  out[0] = (stbi_uc)((r * 255) / 31);
5411  out[1] = (stbi_uc)((g * 255) / 31);
5412  out[2] = (stbi_uc)((b * 255) / 31);
5413 
5414  // some people claim that the most significant bit might be used for alpha
5415  // (possibly if an alpha-bit is set in the "image descriptor byte")
5416  // but that only made 16bit test images completely translucent..
5417  // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5418 }
5419 
5420 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5421 {
5422  // read in the TGA header stuff
5423  int tga_offset = stbi__get8(s);
5424  int tga_indexed = stbi__get8(s);
5425  int tga_image_type = stbi__get8(s);
5426  int tga_is_RLE = 0;
5427  int tga_palette_start = stbi__get16le(s);
5428  int tga_palette_len = stbi__get16le(s);
5429  int tga_palette_bits = stbi__get8(s);
5430  int tga_x_origin = stbi__get16le(s);
5431  int tga_y_origin = stbi__get16le(s);
5432  int tga_width = stbi__get16le(s);
5433  int tga_height = stbi__get16le(s);
5434  int tga_bits_per_pixel = stbi__get8(s);
5435  int tga_comp, tga_rgb16 = 0;
5436  int tga_inverted = stbi__get8(s);
5437  // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5438  // image data
5439  unsigned char *tga_data;
5440  unsigned char *tga_palette = NULL;
5441  int i, j;
5442  unsigned char raw_data[4];
5443  int RLE_count = 0;
5444  int RLE_repeating = 0;
5445  int read_next_pixel = 1;
5446  STBI_NOTUSED(ri);
5447 
5448  // do a tiny bit of precessing
5449  if (tga_image_type >= 8)
5450  {
5451  tga_image_type -= 8;
5452  tga_is_RLE = 1;
5453  }
5454  tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5455 
5456  // If I'm paletted, then I'll use the number of bits from the palette
5457  if (tga_indexed) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5458  else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5459 
5460  if (!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5461  return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5462 
5463  // tga info
5464  *x = tga_width;
5465  *y = tga_height;
5466  if (comp) *comp = tga_comp;
5467 
5468  if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
5469  return stbi__errpuc("too large", "Corrupt TGA");
5470 
5471  tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
5472  if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5473 
5474  // skip to the data's starting position (offset usually = 0)
5475  stbi__skip(s, tga_offset);
5476 
5477  if (!tga_indexed && !tga_is_RLE && !tga_rgb16) {
5478  for (i = 0; i < tga_height; ++i) {
5479  int row = tga_inverted ? tga_height - i - 1 : i;
5480  stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5481  stbi__getn(s, tga_row, tga_width * tga_comp);
5482  }
5483  }
5484  else {
5485  // do I need to load a palette?
5486  if (tga_indexed)
5487  {
5488  // any data to skip? (offset usually = 0)
5489  stbi__skip(s, tga_palette_start);
5490  // load the palette
5491  tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
5492  if (!tga_palette) {
5493  STBI_FREE(tga_data);
5494  return stbi__errpuc("outofmem", "Out of memory");
5495  }
5496  if (tga_rgb16) {
5497  stbi_uc *pal_entry = tga_palette;
5498  STBI_ASSERT(tga_comp == STBI_rgb);
5499  for (i = 0; i < tga_palette_len; ++i) {
5500  stbi__tga_read_rgb16(s, pal_entry);
5501  pal_entry += tga_comp;
5502  }
5503  }
5504  else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5505  STBI_FREE(tga_data);
5506  STBI_FREE(tga_palette);
5507  return stbi__errpuc("bad palette", "Corrupt TGA");
5508  }
5509  }
5510  // load the data
5511  for (i = 0; i < tga_width * tga_height; ++i)
5512  {
5513  // if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5514  if (tga_is_RLE)
5515  {
5516  if (RLE_count == 0)
5517  {
5518  // yep, get the next byte as a RLE command
5519  int RLE_cmd = stbi__get8(s);
5520  RLE_count = 1 + (RLE_cmd & 127);
5521  RLE_repeating = RLE_cmd >> 7;
5522  read_next_pixel = 1;
5523  }
5524  else if (!RLE_repeating)
5525  {
5526  read_next_pixel = 1;
5527  }
5528  }
5529  else
5530  {
5531  read_next_pixel = 1;
5532  }
5533  // OK, if I need to read a pixel, do it now
5534  if (read_next_pixel)
5535  {
5536  // load however much data we did have
5537  if (tga_indexed)
5538  {
5539  // read in index, then perform the lookup
5540  int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5541  if (pal_idx >= tga_palette_len) {
5542  // invalid index
5543  pal_idx = 0;
5544  }
5545  pal_idx *= tga_comp;
5546  for (j = 0; j < tga_comp; ++j) {
5547  raw_data[j] = tga_palette[pal_idx + j];
5548  }
5549  }
5550  else if (tga_rgb16) {
5551  STBI_ASSERT(tga_comp == STBI_rgb);
5552  stbi__tga_read_rgb16(s, raw_data);
5553  }
5554  else {
5555  // read in the data raw
5556  for (j = 0; j < tga_comp; ++j) {
5557  raw_data[j] = stbi__get8(s);
5558  }
5559  }
5560  // clear the reading flag for the next pixel
5561  read_next_pixel = 0;
5562  } // end of reading a pixel
5563 
5564  // copy data
5565  for (j = 0; j < tga_comp; ++j)
5566  tga_data[i*tga_comp + j] = raw_data[j];
5567 
5568  // in case we're in RLE mode, keep counting down
5569  --RLE_count;
5570  }
5571  // do I need to invert the image?
5572  if (tga_inverted)
5573  {
5574  for (j = 0; j * 2 < tga_height; ++j)
5575  {
5576  int index1 = j * tga_width * tga_comp;
5577  int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5578  for (i = tga_width * tga_comp; i > 0; --i)
5579  {
5580  unsigned char temp = tga_data[index1];
5581  tga_data[index1] = tga_data[index2];
5582  tga_data[index2] = temp;
5583  ++index1;
5584  ++index2;
5585  }
5586  }
5587  }
5588  // clear my palette, if I had one
5589  if (tga_palette != NULL)
5590  {
5591  STBI_FREE(tga_palette);
5592  }
5593  }
5594 
5595  // swap RGB - if the source data was RGB16, it already is in the right order
5596 #ifndef DIRECTX
5597  if (tga_comp >= 3 && !tga_rgb16)
5598  {
5599  unsigned char* tga_pixel = tga_data;
5600  for (i = 0; i < tga_width * tga_height; ++i)
5601  {
5602  unsigned char temp = tga_pixel[0];
5603  tga_pixel[0] = tga_pixel[2];
5604  tga_pixel[2] = temp;
5605  tga_pixel += tga_comp;
5606  }
5607  }
5608 #endif
5609 
5610  // convert to target component count
5611  if (req_comp && req_comp != tga_comp)
5612  tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5613 
5614  // the things I do to get rid of an error message, and yet keep
5615  // Microsoft's C compilers happy... [8^(
5616  tga_palette_start = tga_palette_len = tga_palette_bits =
5617  tga_x_origin = tga_y_origin = 0;
5618  // OK, done
5619  return tga_data;
5620 }
5621 #endif
5622 
5623 // *************************************************************************************************
5624 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5625 
5626 #ifndef STBI_NO_PSD
5627 static int stbi__psd_test(stbi__context *s)
5628 {
5629  int r = (stbi__get32be(s) == 0x38425053);
5630  stbi__rewind(s);
5631  return r;
5632 }
5633 
5634 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
5635 {
5636  int count, nleft, len;
5637 
5638  count = 0;
5639  while ((nleft = pixelCount - count) > 0) {
5640  len = stbi__get8(s);
5641  if (len == 128) {
5642  // No-op.
5643  }
5644  else if (len < 128) {
5645  // Copy next len+1 bytes literally.
5646  len++;
5647  if (len > nleft) return 0; // corrupt data
5648  count += len;
5649  while (len) {
5650  *p = stbi__get8(s);
5651  p += 4;
5652  len--;
5653  }
5654  }
5655  else if (len > 128) {
5656  stbi_uc val;
5657  // Next -len+1 bytes in the dest are replicated from next source byte.
5658  // (Interpret len as a negative 8-bit int.)
5659  len = 257 - len;
5660  if (len > nleft) return 0; // corrupt data
5661  val = stbi__get8(s);
5662  count += len;
5663  while (len) {
5664  *p = val;
5665  p += 4;
5666  len--;
5667  }
5668  }
5669  }
5670 
5671  return 1;
5672 }
5673 
5674 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
5675 {
5676  int pixelCount;
5677  int channelCount, compression;
5678  int channel, i;
5679  int bitdepth;
5680  int w, h;
5681  stbi_uc *out;
5682  STBI_NOTUSED(ri);
5683 
5684  // Check identifier
5685  if (stbi__get32be(s) != 0x38425053) // "8BPS"
5686  return stbi__errpuc("not PSD", "Corrupt PSD image");
5687 
5688  // Check file type version.
5689  if (stbi__get16be(s) != 1)
5690  return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5691 
5692  // Skip 6 reserved bytes.
5693  stbi__skip(s, 6);
5694 
5695  // Read the number of channels (R, G, B, A, etc).
5696  channelCount = stbi__get16be(s);
5697  if (channelCount < 0 || channelCount > 16)
5698  return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5699 
5700  // Read the rows and columns of the image.
5701  h = stbi__get32be(s);
5702  w = stbi__get32be(s);
5703 
5704  // Make sure the depth is 8 bits.
5705  bitdepth = stbi__get16be(s);
5706  if (bitdepth != 8 && bitdepth != 16)
5707  return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5708 
5709  // Make sure the color mode is RGB.
5710  // Valid options are:
5711  // 0: Bitmap
5712  // 1: Grayscale
5713  // 2: Indexed color
5714  // 3: RGB color
5715  // 4: CMYK color
5716  // 7: Multichannel
5717  // 8: Duotone
5718  // 9: Lab color
5719  if (stbi__get16be(s) != 3)
5720  return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5721 
5722  // Skip the Mode Data. (It's the palette for indexed color; other info for other modes.)
5723  stbi__skip(s, stbi__get32be(s));
5724 
5725  // Skip the image resources. (resolution, pen tool paths, etc)
5726  stbi__skip(s, stbi__get32be(s));
5727 
5728  // Skip the reserved data.
5729  stbi__skip(s, stbi__get32be(s));
5730 
5731  // Find out if the data is compressed.
5732  // Known values:
5733  // 0: no compression
5734  // 1: RLE compressed
5735  compression = stbi__get16be(s);
5736  if (compression > 1)
5737  return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5738 
5739  // Check size
5740  if (!stbi__mad3sizes_valid(4, w, h, 0))
5741  return stbi__errpuc("too large", "Corrupt PSD");
5742 
5743  // Create the destination image.
5744 
5745  if (!compression && bitdepth == 16 && bpc == 16) {
5746  out = (stbi_uc *)stbi__malloc_mad3(8, w, h, 0);
5747  ri->bits_per_channel = 16;
5748  }
5749  else
5750  out = (stbi_uc *)stbi__malloc(4 * w*h);
5751 
5752  if (!out) return stbi__errpuc("outofmem", "Out of memory");
5753  pixelCount = w*h;
5754 
5755  // Initialize the data to zero.
5756  //memset( out, 0, pixelCount * 4 );
5757 
5758  // Finally, the image data.
5759  if (compression) {
5760  // RLE as used by .PSD and .TIFF
5761  // Loop until you get the number of unpacked bytes you are expecting:
5762  // Read the next source byte into n.
5763  // If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5764  // Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5765  // Else if n is 128, noop.
5766  // Endloop
5767 
5768  // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5769  // which we're going to just skip.
5770  stbi__skip(s, h * channelCount * 2);
5771 
5772  // Read the RLE data by channel.
5773  for (channel = 0; channel < 4; channel++) {
5774  stbi_uc *p;
5775 
5776  p = out + channel;
5777  if (channel >= channelCount) {
5778  // Fill this channel with default data.
5779  for (i = 0; i < pixelCount; i++, p += 4)
5780  *p = (channel == 3 ? 255 : 0);
5781  }
5782  else {
5783  // Read the RLE data.
5784  if (!stbi__psd_decode_rle(s, p, pixelCount)) {
5785  STBI_FREE(out);
5786  return stbi__errpuc("corrupt", "bad RLE data");
5787  }
5788  }
5789  }
5790 
5791  }
5792  else {
5793  // We're at the raw image data. It's each channel in order (Red, Green, Blue, Alpha, ...)
5794  // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
5795 
5796  // Read the data by channel.
5797  for (channel = 0; channel < 4; channel++) {
5798  if (channel >= channelCount) {
5799  // Fill this channel with default data.
5800  if (bitdepth == 16 && bpc == 16) {
5801  stbi__uint16 *q = ((stbi__uint16 *)out) + channel;
5802  stbi__uint16 val = channel == 3 ? 65535 : 0;
5803  for (i = 0; i < pixelCount; i++, q += 4)
5804  *q = val;
5805  }
5806  else {
5807  stbi_uc *p = out + channel;
5808  stbi_uc val = channel == 3 ? 255 : 0;
5809  for (i = 0; i < pixelCount; i++, p += 4)
5810  *p = val;
5811  }
5812  }
5813  else {
5814  if (ri->bits_per_channel == 16) { // output bpc
5815  stbi__uint16 *q = ((stbi__uint16 *)out) + channel;
5816  for (i = 0; i < pixelCount; i++, q += 4)
5817  *q = (stbi__uint16)stbi__get16be(s);
5818  }
5819  else {
5820  stbi_uc *p = out + channel;
5821  if (bitdepth == 16) { // input bpc
5822  for (i = 0; i < pixelCount; i++, p += 4)
5823  *p = (stbi_uc)(stbi__get16be(s) >> 8);
5824  }
5825  else {
5826  for (i = 0; i < pixelCount; i++, p += 4)
5827  *p = stbi__get8(s);
5828  }
5829  }
5830  }
5831  }
5832  }
5833 
5834  // remove weird white matte from PSD
5835  if (channelCount >= 4) {
5836  if (ri->bits_per_channel == 16) {
5837  for (i = 0; i < w*h; ++i) {
5838  stbi__uint16 *pixel = (stbi__uint16 *)out + 4 * i;
5839  if (pixel[3] != 0 && pixel[3] != 65535) {
5840  float a = pixel[3] / 65535.0f;
5841  float ra = 1.0f / a;
5842  float inv_a = 65535.0f * (1 - ra);
5843  pixel[0] = (stbi__uint16)(pixel[0] * ra + inv_a);
5844  pixel[1] = (stbi__uint16)(pixel[1] * ra + inv_a);
5845  pixel[2] = (stbi__uint16)(pixel[2] * ra + inv_a);
5846  }
5847  }
5848  }
5849  else {
5850  for (i = 0; i < w*h; ++i) {
5851  unsigned char *pixel = out + 4 * i;
5852  if (pixel[3] != 0 && pixel[3] != 255) {
5853  float a = pixel[3] / 255.0f;
5854  float ra = 1.0f / a;
5855  float inv_a = 255.0f * (1 - ra);
5856  pixel[0] = (unsigned char)(pixel[0] * ra + inv_a);
5857  pixel[1] = (unsigned char)(pixel[1] * ra + inv_a);
5858  pixel[2] = (unsigned char)(pixel[2] * ra + inv_a);
5859  }
5860  }
5861  }
5862  }
5863 
5864  // convert to desired output format
5865  if (req_comp && req_comp != 4) {
5866  if (ri->bits_per_channel == 16)
5867  out = (stbi_uc *)stbi__convert_format16((stbi__uint16 *)out, 4, req_comp, w, h);
5868  else
5869  out = stbi__convert_format(out, 4, req_comp, w, h);
5870  if (out == NULL) return out; // stbi__convert_format frees input on failure
5871  }
5872 
5873  if (comp) *comp = 4;
5874  *y = h;
5875  *x = w;
5876 
5877  return out;
5878 }
5879 #endif
5880 
5881 // *************************************************************************************************
5882 // Softimage PIC loader
5883 // by Tom Seddon
5884 //
5885 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5886 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5887 
5888 #ifndef STBI_NO_PIC
5889 static int stbi__pic_is4(stbi__context *s, const char *str)
5890 {
5891  int i;
5892  for (i = 0; i<4; ++i)
5893  if (stbi__get8(s) != (stbi_uc)str[i])
5894  return 0;
5895 
5896  return 1;
5897 }
5898 
5899 static int stbi__pic_test_core(stbi__context *s)
5900 {
5901  int i;
5902 
5903  if (!stbi__pic_is4(s, "\x53\x80\xF6\x34"))
5904  return 0;
5905 
5906  for (i = 0; i<84; ++i)
5907  stbi__get8(s);
5908 
5909  if (!stbi__pic_is4(s, "PICT"))
5910  return 0;
5911 
5912  return 1;
5913 }
5914 
5915 typedef struct
5916 {
5917  stbi_uc size, type, channel;
5918 } stbi__pic_packet;
5919 
5920 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5921 {
5922  int mask = 0x80, i;
5923 
5924  for (i = 0; i<4; ++i, mask >>= 1) {
5925  if (channel & mask) {
5926  if (stbi__at_eof(s)) return stbi__errpuc("bad file", "PIC file too short");
5927  dest[i] = stbi__get8(s);
5928  }
5929  }
5930 
5931  return dest;
5932 }
5933 
5934 static void stbi__copyval(int channel, stbi_uc *dest, const stbi_uc *src)
5935 {
5936  int mask = 0x80, i;
5937 
5938  for (i = 0; i<4; ++i, mask >>= 1)
5939  if (channel&mask)
5940  dest[i] = src[i];
5941 }
5942 
5943 static stbi_uc *stbi__pic_load_core(stbi__context *s, int width, int height, int *comp, stbi_uc *result)
5944 {
5945  int act_comp = 0, num_packets = 0, y, chained;
5946  stbi__pic_packet packets[10];
5947 
5948  // this will (should...) cater for even some bizarre stuff like having data
5949  // for the same channel in multiple packets.
5950  do {
5951  stbi__pic_packet *packet;
5952 
5953  if (num_packets == sizeof(packets) / sizeof(packets[0]))
5954  return stbi__errpuc("bad format", "too many packets");
5955 
5956  packet = &packets[num_packets++];
5957 
5958  chained = stbi__get8(s);
5959  packet->size = stbi__get8(s);
5960  packet->type = stbi__get8(s);
5961  packet->channel = stbi__get8(s);
5962 
5963  act_comp |= packet->channel;
5964 
5965  if (stbi__at_eof(s)) return stbi__errpuc("bad file", "file too short (reading packets)");
5966  if (packet->size != 8) return stbi__errpuc("bad format", "packet isn't 8bpp");
5967  } while (chained);
5968 
5969  *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5970 
5971  for (y = 0; y<height; ++y) {
5972  int packet_idx;
5973 
5974  for (packet_idx = 0; packet_idx < num_packets; ++packet_idx) {
5975  stbi__pic_packet *packet = &packets[packet_idx];
5976  stbi_uc *dest = result + y*width * 4;
5977 
5978  switch (packet->type) {
5979  default:
5980  return stbi__errpuc("bad format", "packet has bad compression type");
5981 
5982  case 0: {//uncompressed
5983  int x;
5984 
5985  for (x = 0; x<width; ++x, dest += 4)
5986  if (!stbi__readval(s, packet->channel, dest))
5987  return 0;
5988  break;
5989  }
5990 
5991  case 1://Pure RLE
5992  {
5993  int left = width, i;
5994 
5995  while (left>0) {
5996  stbi_uc count, value[4];
5997 
5998  count = stbi__get8(s);
5999  if (stbi__at_eof(s)) return stbi__errpuc("bad file", "file too short (pure read count)");
6000 
6001  if (count > left)
6002  count = (stbi_uc)left;
6003 
6004  if (!stbi__readval(s, packet->channel, value)) return 0;
6005 
6006  for (i = 0; i<count; ++i, dest += 4)
6007  stbi__copyval(packet->channel, dest, value);
6008  left -= count;
6009  }
6010  }
6011  break;
6012 
6013  case 2: {//Mixed RLE
6014  int left = width;
6015  while (left>0) {
6016  int count = stbi__get8(s), i;
6017  if (stbi__at_eof(s)) return stbi__errpuc("bad file", "file too short (mixed read count)");
6018 
6019  if (count >= 128) { // Repeated
6020  stbi_uc value[4];
6021 
6022  if (count == 128)
6023  count = stbi__get16be(s);
6024  else
6025  count -= 127;
6026  if (count > left)
6027  return stbi__errpuc("bad file", "scanline overrun");
6028 
6029  if (!stbi__readval(s, packet->channel, value))
6030  return 0;
6031 
6032  for (i = 0; i<count; ++i, dest += 4)
6033  stbi__copyval(packet->channel, dest, value);
6034  }
6035  else { // Raw
6036  ++count;
6037  if (count>left) return stbi__errpuc("bad file", "scanline overrun");
6038 
6039  for (i = 0; i<count; ++i, dest += 4)
6040  if (!stbi__readval(s, packet->channel, dest))
6041  return 0;
6042  }
6043  left -= count;
6044  }
6045  break;
6046  }
6047  }
6048  }
6049  }
6050 
6051  return result;
6052 }
6053 
6054 static void *stbi__pic_load(stbi__context *s, int *px, int *py, int *comp, int req_comp, stbi__result_info *ri)
6055 {
6056  stbi_uc *result;
6057  int i, x, y;
6058  STBI_NOTUSED(ri);
6059 
6060  for (i = 0; i<92; ++i)
6061  stbi__get8(s);
6062 
6063  x = stbi__get16be(s);
6064  y = stbi__get16be(s);
6065  if (stbi__at_eof(s)) return stbi__errpuc("bad file", "file too short (pic header)");
6066  if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
6067 
6068  stbi__get32be(s); //skip `ratio'
6069  stbi__get16be(s); //skip `fields'
6070  stbi__get16be(s); //skip `pad'
6071 
6072  // intermediate buffer is RGBA
6073  result = (stbi_uc *)stbi__malloc_mad3(x, y, 4, 0);
6074  memset(result, 0xff, x*y * 4);
6075 
6076  if (!stbi__pic_load_core(s, x, y, comp, result)) {
6077  STBI_FREE(result);
6078  result = 0;
6079  }
6080  *px = x;
6081  *py = y;
6082  if (req_comp == 0) req_comp = *comp;
6083  result = stbi__convert_format(result, 4, req_comp, x, y);
6084 
6085  return result;
6086 }
6087 
6088 static int stbi__pic_test(stbi__context *s)
6089 {
6090  int r = stbi__pic_test_core(s);
6091  stbi__rewind(s);
6092  return r;
6093 }
6094 #endif
6095 
6096 // *************************************************************************************************
6097 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6098 
6099 #ifndef STBI_NO_GIF
6100 typedef struct
6101 {
6102  stbi__int16 prefix;
6103  stbi_uc first;
6104  stbi_uc suffix;
6105 } stbi__gif_lzw;
6106 
6107 typedef struct
6108 {
6109  int w, h;
6110  stbi_uc *out, *old_out; // output buffer (always 4 components)
6111  int flags, bgindex, ratio, transparent, eflags, delay;
6112  stbi_uc pal[256][4];
6113  stbi_uc lpal[256][4];
6114  stbi__gif_lzw codes[4096];
6115  stbi_uc *color_table;
6116  int parse, step;
6117  int lflags;
6118  int start_x, start_y;
6119  int max_x, max_y;
6120  int cur_x, cur_y;
6121  int line_size;
6122 } stbi__gif;
6123 
6124 static int stbi__gif_test_raw(stbi__context *s)
6125 {
6126  int sz;
6127  if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
6128  sz = stbi__get8(s);
6129  if (sz != '9' && sz != '7') return 0;
6130  if (stbi__get8(s) != 'a') return 0;
6131  return 1;
6132 }
6133 
6134 static int stbi__gif_test(stbi__context *s)
6135 {
6136  int r = stbi__gif_test_raw(s);
6137  stbi__rewind(s);
6138  return r;
6139 }
6140 
6141 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
6142 {
6143  int i;
6144  for (i = 0; i < num_entries; ++i) {
6145  pal[i][2] = stbi__get8(s);
6146  pal[i][1] = stbi__get8(s);
6147  pal[i][0] = stbi__get8(s);
6148  pal[i][3] = transp == i ? 0 : 255;
6149  }
6150 }
6151 
6152 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
6153 {
6154  stbi_uc version;
6155  if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6156  return stbi__err("not GIF", "Corrupt GIF");
6157 
6158  version = stbi__get8(s);
6159  if (version != '7' && version != '9') return stbi__err("not GIF", "Corrupt GIF");
6160  if (stbi__get8(s) != 'a') return stbi__err("not GIF", "Corrupt GIF");
6161 
6162  stbi__g_failure_reason = "";
6163  g->w = stbi__get16le(s);
6164  g->h = stbi__get16le(s);
6165  g->flags = stbi__get8(s);
6166  g->bgindex = stbi__get8(s);
6167  g->ratio = stbi__get8(s);
6168  g->transparent = -1;
6169 
6170  if (comp != 0) *comp = 4; // can't actually tell whether it's 3 or 4 until we parse the comments
6171 
6172  if (is_info) return 1;
6173 
6174  if (g->flags & 0x80)
6175  stbi__gif_parse_colortable(s, g->pal, 2 << (g->flags & 7), -1);
6176 
6177  return 1;
6178 }
6179 
6180 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
6181 {
6182  stbi__gif* g = (stbi__gif*)stbi__malloc(sizeof(stbi__gif));
6183  if (!stbi__gif_header(s, g, comp, 1)) {
6184  STBI_FREE(g);
6185  stbi__rewind(s);
6186  return 0;
6187  }
6188  if (x) *x = g->w;
6189  if (y) *y = g->h;
6190  STBI_FREE(g);
6191  return 1;
6192 }
6193 
6194 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
6195 {
6196  stbi_uc *p, *c;
6197 
6198  // recurse to decode the prefixes, since the linked-list is backwards,
6199  // and working backwards through an interleaved image would be nasty
6200  if (g->codes[code].prefix >= 0)
6201  stbi__out_gif_code(g, g->codes[code].prefix);
6202 
6203  if (g->cur_y >= g->max_y) return;
6204 
6205  p = &g->out[g->cur_x + g->cur_y];
6206  c = &g->color_table[g->codes[code].suffix * 4];
6207 
6208  if (c[3] >= 128) {
6209  p[0] = c[2];
6210  p[1] = c[1];
6211  p[2] = c[0];
6212  p[3] = c[3];
6213  }
6214  g->cur_x += 4;
6215 
6216  if (g->cur_x >= g->max_x) {
6217  g->cur_x = g->start_x;
6218  g->cur_y += g->step;
6219 
6220  while (g->cur_y >= g->max_y && g->parse > 0) {
6221  g->step = (1 << g->parse) * g->line_size;
6222  g->cur_y = g->start_y + (g->step >> 1);
6223  --g->parse;
6224  }
6225  }
6226 }
6227 
6228 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
6229 {
6230  stbi_uc lzw_cs;
6231  stbi__int32 len, init_code;
6232  stbi__uint32 first;
6233  stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6234  stbi__gif_lzw *p;
6235 
6236  lzw_cs = stbi__get8(s);
6237  if (lzw_cs > 12) return NULL;
6238  clear = 1 << lzw_cs;
6239  first = 1;
6240  codesize = lzw_cs + 1;
6241  codemask = (1 << codesize) - 1;
6242  bits = 0;
6243  valid_bits = 0;
6244  for (init_code = 0; init_code < clear; init_code++) {
6245  g->codes[init_code].prefix = -1;
6246  g->codes[init_code].first = (stbi_uc)init_code;
6247  g->codes[init_code].suffix = (stbi_uc)init_code;
6248  }
6249 
6250  // support no starting clear code
6251  avail = clear + 2;
6252  oldcode = -1;
6253 
6254  len = 0;
6255  for (;;) {
6256  if (valid_bits < codesize) {
6257  if (len == 0) {
6258  len = stbi__get8(s); // start new block
6259  if (len == 0)
6260  return g->out;
6261  }
6262  --len;
6263  bits |= (stbi__int32)stbi__get8(s) << valid_bits;
6264  valid_bits += 8;
6265  }
6266  else {
6267  stbi__int32 code = bits & codemask;
6268  bits >>= codesize;
6269  valid_bits -= codesize;
6270  // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6271  if (code == clear) { // clear code
6272  codesize = lzw_cs + 1;
6273  codemask = (1 << codesize) - 1;
6274  avail = clear + 2;
6275  oldcode = -1;
6276  first = 0;
6277  }
6278  else if (code == clear + 1) { // end of stream code
6279  stbi__skip(s, len);
6280  while ((len = stbi__get8(s)) > 0)
6281  stbi__skip(s, len);
6282  return g->out;
6283  }
6284  else if (code <= avail) {
6285  if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
6286 
6287  if (oldcode >= 0) {
6288  p = &g->codes[avail++];
6289  if (avail > 4096) return stbi__errpuc("too many codes", "Corrupt GIF");
6290  p->prefix = (stbi__int16)oldcode;
6291  p->first = g->codes[oldcode].first;
6292  p->suffix = (code == avail) ? p->first : g->codes[code].first;
6293  }
6294  else if (code == avail)
6295  return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6296 
6297  stbi__out_gif_code(g, (stbi__uint16)code);
6298 
6299  if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6300  codesize++;
6301  codemask = (1 << codesize) - 1;
6302  }
6303 
6304  oldcode = code;
6305  }
6306  else {
6307  return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6308  }
6309  }
6310  }
6311 }
6312 
6313 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
6314 {
6315  int x, y;
6316  stbi_uc *c = g->pal[g->bgindex];
6317  for (y = y0; y < y1; y += 4 * g->w) {
6318  for (x = x0; x < x1; x += 4) {
6319  stbi_uc *p = &g->out[y + x];
6320  p[0] = c[2];
6321  p[1] = c[1];
6322  p[2] = c[0];
6323  p[3] = 0;
6324  }
6325  }
6326 }
6327 
6328 // this function is designed to support animated gifs, although stb_image doesn't support it
6329 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
6330 {
6331  int i;
6332  stbi_uc *prev_out = 0;
6333 
6334  if (g->out == 0 && !stbi__gif_header(s, g, comp, 0))
6335  return 0; // stbi__g_failure_reason set by stbi__gif_header
6336 
6337  if (!stbi__mad3sizes_valid(g->w, g->h, 4, 0))
6338  return stbi__errpuc("too large", "GIF too large");
6339 
6340  prev_out = g->out;
6341  g->out = (stbi_uc *)stbi__malloc_mad3(4, g->w, g->h, 0);
6342  if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
6343 
6344  switch ((g->eflags & 0x1C) >> 2) {
6345  case 0: // unspecified (also always used on 1st frame)
6346  stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
6347  break;
6348  case 1: // do not dispose
6349  if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
6350  g->old_out = prev_out;
6351  break;
6352  case 2: // dispose to background
6353  if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
6354  stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
6355  break;
6356  case 3: // dispose to previous
6357  if (g->old_out) {
6358  for (i = g->start_y; i < g->max_y; i += 4 * g->w)
6359  memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
6360  }
6361  break;
6362  }
6363 
6364  for (;;) {
6365  switch (stbi__get8(s)) {
6366  case 0x2C: /* Image Descriptor */
6367  {
6368  int prev_trans = -1;
6369  stbi__int32 x, y, w, h;
6370  stbi_uc *o;
6371 
6372  x = stbi__get16le(s);
6373  y = stbi__get16le(s);
6374  w = stbi__get16le(s);
6375  h = stbi__get16le(s);
6376  if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6377  return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6378 
6379  g->line_size = g->w * 4;
6380  g->start_x = x * 4;
6381  g->start_y = y * g->line_size;
6382  g->max_x = g->start_x + w * 4;
6383  g->max_y = g->start_y + h * g->line_size;
6384  g->cur_x = g->start_x;
6385  g->cur_y = g->start_y;
6386 
6387  g->lflags = stbi__get8(s);
6388 
6389  if (g->lflags & 0x40) {
6390  g->step = 8 * g->line_size; // first interlaced spacing
6391  g->parse = 3;
6392  }
6393  else {
6394  g->step = g->line_size;
6395  g->parse = 0;
6396  }
6397 
6398  if (g->lflags & 0x80) {
6399  stbi__gif_parse_colortable(s, g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6400  g->color_table = (stbi_uc *)g->lpal;
6401  }
6402  else if (g->flags & 0x80) {
6403  if (g->transparent >= 0 && (g->eflags & 0x01)) {
6404  prev_trans = g->pal[g->transparent][3];
6405  g->pal[g->transparent][3] = 0;
6406  }
6407  g->color_table = (stbi_uc *)g->pal;
6408  }
6409  else
6410  return stbi__errpuc("missing color table", "Corrupt GIF");
6411 
6412  o = stbi__process_gif_raster(s, g);
6413  if (o == NULL) return NULL;
6414 
6415  if (prev_trans != -1)
6416  g->pal[g->transparent][3] = (stbi_uc)prev_trans;
6417 
6418  return o;
6419  }
6420 
6421  case 0x21: // Comment Extension.
6422  {
6423  int len;
6424  if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
6425  len = stbi__get8(s);
6426  if (len == 4) {
6427  g->eflags = stbi__get8(s);
6428  g->delay = stbi__get16le(s);
6429  g->transparent = stbi__get8(s);
6430  }
6431  else {
6432  stbi__skip(s, len);
6433  break;
6434  }
6435  }
6436  while ((len = stbi__get8(s)) != 0)
6437  stbi__skip(s, len);
6438  break;
6439  }
6440 
6441  case 0x3B: // gif stream termination code
6442  return (stbi_uc *)s; // using '1' causes warning on some compilers
6443 
6444  default:
6445  return stbi__errpuc("unknown code", "Corrupt GIF");
6446  }
6447  }
6448 
6449  STBI_NOTUSED(req_comp);
6450 }
6451 
6452 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6453 {
6454  stbi_uc *u = 0;
6455  stbi__gif* g = (stbi__gif*)stbi__malloc(sizeof(stbi__gif));
6456  memset(g, 0, sizeof(*g));
6457  STBI_NOTUSED(ri);
6458 
6459  u = stbi__gif_load_next(s, g, comp, req_comp);
6460  if (u == (stbi_uc *)s) u = 0; // end of animated gif marker
6461  if (u) {
6462  *x = g->w;
6463  *y = g->h;
6464  if (req_comp && req_comp != 4)
6465  u = stbi__convert_format(u, 4, req_comp, g->w, g->h);
6466  }
6467  else if (g->out)
6468  STBI_FREE(g->out);
6469  STBI_FREE(g);
6470  return u;
6471 }
6472 
6473 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
6474 {
6475  return stbi__gif_info_raw(s, x, y, comp);
6476 }
6477 #endif
6478 
6479 // *************************************************************************************************
6480 // Radiance RGBE HDR loader
6481 // originally by Nicolas Schulz
6482 #ifndef STBI_NO_HDR
6483 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
6484 {
6485  int i;
6486  for (i = 0; signature[i]; ++i)
6487  if (stbi__get8(s) != signature[i])
6488  return 0;
6489  stbi__rewind(s);
6490  return 1;
6491 }
6492 
6493 static int stbi__hdr_test(stbi__context* s)
6494 {
6495  int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
6496  stbi__rewind(s);
6497  if (!r) {
6498  r = stbi__hdr_test_core(s, "#?RGBE\n");
6499  stbi__rewind(s);
6500  }
6501  return r;
6502 }
6503 
6504 #define STBI__HDR_BUFLEN 1024
6505 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
6506 {
6507  int len = 0;
6508  char c = '\0';
6509 
6510  c = (char)stbi__get8(z);
6511 
6512  while (!stbi__at_eof(z) && c != '\n') {
6513  buffer[len++] = c;
6514  if (len == STBI__HDR_BUFLEN - 1) {
6515  // flush to end of line
6516  while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
6517  ;
6518  break;
6519  }
6520  c = (char)stbi__get8(z);
6521  }
6522 
6523  buffer[len] = 0;
6524  return buffer;
6525 }
6526 
6527 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
6528 {
6529  if (input[3] != 0) {
6530  float f1;
6531  // Exponent
6532  f1 = (float)ldexp(1.0f, input[3] - (int)(128 + 8));
6533  if (req_comp <= 2)
6534  output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
6535  else {
6536  output[0] = input[0] * f1;
6537  output[1] = input[1] * f1;
6538  output[2] = input[2] * f1;
6539  }
6540  if (req_comp == 2) output[1] = 1;
6541  if (req_comp == 4) output[3] = 1;
6542  }
6543  else {
6544  switch (req_comp) {
6545  case 4: output[3] = 1; /* fallthrough */
6546  case 3: output[0] = output[1] = output[2] = 0;
6547  break;
6548  case 2: output[1] = 1; /* fallthrough */
6549  case 1: output[0] = 0;
6550  break;
6551  }
6552  }
6553 }
6554 
6555 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6556 {
6557  char buffer[STBI__HDR_BUFLEN];
6558  char *token;
6559  int valid = 0;
6560  int width, height;
6561  stbi_uc *scanline;
6562  float *hdr_data;
6563  int len;
6564  unsigned char count, value;
6565  int i, j, k, c1, c2, z;
6566  const char *headerToken;
6567  STBI_NOTUSED(ri);
6568 
6569  // Check identifier
6570  headerToken = stbi__hdr_gettoken(s, buffer);
6571  if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
6572  return stbi__errpf("not HDR", "Corrupt HDR image");
6573 
6574  // Parse header
6575  for (;;) {
6576  token = stbi__hdr_gettoken(s, buffer);
6577  if (token[0] == 0) break;
6578  if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6579  }
6580 
6581  if (!valid) return stbi__errpf("unsupported format", "Unsupported HDR format");
6582 
6583  // Parse width and height
6584  // can't use sscanf() if we're not using stdio!
6585  token = stbi__hdr_gettoken(s, buffer);
6586  if (strncmp(token, "-Y ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6587  token += 3;
6588  height = (int)strtol(token, &token, 10);
6589  while (*token == ' ') ++token;
6590  if (strncmp(token, "+X ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6591  token += 3;
6592  width = (int)strtol(token, NULL, 10);
6593 
6594  *x = width;
6595  *y = height;
6596 
6597  if (comp) *comp = 3;
6598  if (req_comp == 0) req_comp = 3;
6599 
6600  if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
6601  return stbi__errpf("too large", "HDR image is too large");
6602 
6603  // Read data
6604  hdr_data = (float *)stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
6605  if (!hdr_data)
6606  return stbi__errpf("outofmem", "Out of memory");
6607 
6608  // Load image data
6609  // image data is stored as some number of sca
6610  if (width < 8 || width >= 32768) {
6611  // Read flat data
6612  for (j = 0; j < height; ++j) {
6613  for (i = 0; i < width; ++i) {
6614  stbi_uc rgbe[4];
6615  main_decode_loop:
6616  stbi__getn(s, rgbe, 4);
6617  stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
6618  }
6619  }
6620  }
6621  else {
6622  // Read RLE-encoded data
6623  scanline = NULL;
6624 
6625  for (j = 0; j < height; ++j) {
6626  c1 = stbi__get8(s);
6627  c2 = stbi__get8(s);
6628  len = stbi__get8(s);
6629  if (c1 != 2 || c2 != 2 || (len & 0x80)) {
6630  // not run-length encoded, so we have to actually use THIS data as a decoded
6631  // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
6632  stbi_uc rgbe[4];
6633  rgbe[0] = (stbi_uc)c1;
6634  rgbe[1] = (stbi_uc)c2;
6635  rgbe[2] = (stbi_uc)len;
6636  rgbe[3] = (stbi_uc)stbi__get8(s);
6637  stbi__hdr_convert(hdr_data, rgbe, req_comp);
6638  i = 1;
6639  j = 0;
6640  STBI_FREE(scanline);
6641  goto main_decode_loop; // yes, this makes no sense
6642  }
6643  len <<= 8;
6644  len |= stbi__get8(s);
6645  if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
6646  if (scanline == NULL) {
6647  scanline = (stbi_uc *)stbi__malloc_mad2(width, 4, 0);
6648  if (!scanline) {
6649  STBI_FREE(hdr_data);
6650  return stbi__errpf("outofmem", "Out of memory");
6651  }
6652  }
6653 
6654  for (k = 0; k < 4; ++k) {
6655  int nleft;
6656  i = 0;
6657  while ((nleft = width - i) > 0) {
6658  count = stbi__get8(s);
6659  if (count > 128) {
6660  // Run
6661  value = stbi__get8(s);
6662  count -= 128;
6663  if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
6664  for (z = 0; z < count; ++z)
6665  scanline[i++ * 4 + k] = value;
6666  }
6667  else {
6668  // Dump
6669  if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
6670  for (z = 0; z < count; ++z)
6671  scanline[i++ * 4 + k] = stbi__get8(s);
6672  }
6673  }
6674  }
6675  for (i = 0; i < width; ++i)
6676  stbi__hdr_convert(hdr_data + (j*width + i)*req_comp, scanline + i * 4, req_comp);
6677  }
6678  if (scanline)
6679  STBI_FREE(scanline);
6680  }
6681 
6682  return hdr_data;
6683 }
6684 
6685 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6686 {
6687  char buffer[STBI__HDR_BUFLEN];
6688  char *token;
6689  int valid = 0;
6690 
6691  if (stbi__hdr_test(s) == 0) {
6692  stbi__rewind(s);
6693  return 0;
6694  }
6695 
6696  for (;;) {
6697  token = stbi__hdr_gettoken(s, buffer);
6698  if (token[0] == 0) break;
6699  if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6700  }
6701 
6702  if (!valid) {
6703  stbi__rewind(s);
6704  return 0;
6705  }
6706  token = stbi__hdr_gettoken(s, buffer);
6707  if (strncmp(token, "-Y ", 3)) {
6708  stbi__rewind(s);
6709  return 0;
6710  }
6711  token += 3;
6712  *y = (int)strtol(token, &token, 10);
6713  while (*token == ' ') ++token;
6714  if (strncmp(token, "+X ", 3)) {
6715  stbi__rewind(s);
6716  return 0;
6717  }
6718  token += 3;
6719  *x = (int)strtol(token, NULL, 10);
6720  *comp = 3;
6721  return 1;
6722 }
6723 #endif // STBI_NO_HDR
6724 
6725 #ifndef STBI_NO_BMP
6726 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6727 {
6728  void *p;
6729  stbi__bmp_data info;
6730 
6731  info.all_a = 255;
6732  p = stbi__bmp_parse_header(s, &info);
6733  stbi__rewind(s);
6734  if (p == NULL)
6735  return 0;
6736  *x = s->img_x;
6737  *y = s->img_y;
6738  *comp = info.ma ? 4 : 3;
6739  return 1;
6740 }
6741 #endif
6742 
6743 #ifndef STBI_NO_PSD
6744 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6745 {
6746  int channelCount;
6747  if (stbi__get32be(s) != 0x38425053) {
6748  stbi__rewind(s);
6749  return 0;
6750  }
6751  if (stbi__get16be(s) != 1) {
6752  stbi__rewind(s);
6753  return 0;
6754  }
6755  stbi__skip(s, 6);
6756  channelCount = stbi__get16be(s);
6757  if (channelCount < 0 || channelCount > 16) {
6758  stbi__rewind(s);
6759  return 0;
6760  }
6761  *y = stbi__get32be(s);
6762  *x = stbi__get32be(s);
6763  if (stbi__get16be(s) != 8) {
6764  stbi__rewind(s);
6765  return 0;
6766  }
6767  if (stbi__get16be(s) != 3) {
6768  stbi__rewind(s);
6769  return 0;
6770  }
6771  *comp = 4;
6772  return 1;
6773 }
6774 #endif
6775 
6776 #ifndef STBI_NO_PIC
6777 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6778 {
6779  int act_comp = 0, num_packets = 0, chained;
6780  stbi__pic_packet packets[10];
6781 
6782  if (!stbi__pic_is4(s, "\x53\x80\xF6\x34")) {
6783  stbi__rewind(s);
6784  return 0;
6785  }
6786 
6787  stbi__skip(s, 88);
6788 
6789  *x = stbi__get16be(s);
6790  *y = stbi__get16be(s);
6791  if (stbi__at_eof(s)) {
6792  stbi__rewind(s);
6793  return 0;
6794  }
6795  if ((*x) != 0 && (1 << 28) / (*x) < (*y)) {
6796  stbi__rewind(s);
6797  return 0;
6798  }
6799 
6800  stbi__skip(s, 8);
6801 
6802  do {
6803  stbi__pic_packet *packet;
6804 
6805  if (num_packets == sizeof(packets) / sizeof(packets[0]))
6806  return 0;
6807 
6808  packet = &packets[num_packets++];
6809  chained = stbi__get8(s);
6810  packet->size = stbi__get8(s);
6811  packet->type = stbi__get8(s);
6812  packet->channel = stbi__get8(s);
6813  act_comp |= packet->channel;
6814 
6815  if (stbi__at_eof(s)) {
6816  stbi__rewind(s);
6817  return 0;
6818  }
6819  if (packet->size != 8) {
6820  stbi__rewind(s);
6821  return 0;
6822  }
6823  } while (chained);
6824 
6825  *comp = (act_comp & 0x10 ? 4 : 3);
6826 
6827  return 1;
6828 }
6829 #endif
6830 
6831 // *************************************************************************************************
6832 // Portable Gray Map and Portable Pixel Map loader
6833 // by Ken Miller
6834 //
6835 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6836 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6837 //
6838 // Known limitations:
6839 // Does not support comments in the header section
6840 // Does not support ASCII image data (formats P2 and P3)
6841 // Does not support 16-bit-per-channel
6842 
6843 #ifndef STBI_NO_PNM
6844 
6845 static int stbi__pnm_test(stbi__context *s)
6846 {
6847  char p, t;
6848  p = (char)stbi__get8(s);
6849  t = (char)stbi__get8(s);
6850  if (p != 'P' || (t != '5' && t != '6')) {
6851  stbi__rewind(s);
6852  return 0;
6853  }
6854  return 1;
6855 }
6856 
6857 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6858 {
6859  stbi_uc *out;
6860  STBI_NOTUSED(ri);
6861 
6862  if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6863  return 0;
6864 
6865  *x = s->img_x;
6866  *y = s->img_y;
6867  *comp = s->img_n;
6868 
6869  if (!stbi__mad3sizes_valid(s->img_n, s->img_x, s->img_y, 0))
6870  return stbi__errpuc("too large", "PNM too large");
6871 
6872  out = (stbi_uc *)stbi__malloc_mad3(s->img_n, s->img_x, s->img_y, 0);
6873  if (!out) return stbi__errpuc("outofmem", "Out of memory");
6874  stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6875 
6876  if (req_comp && req_comp != s->img_n) {
6877  out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6878  if (out == NULL) return out; // stbi__convert_format frees input on failure
6879  }
6880  return out;
6881 }
6882 
6883 static int stbi__pnm_isspace(char c)
6884 {
6885  return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6886 }
6887 
6888 static void stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6889 {
6890  for (;;) {
6891  while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6892  *c = (char)stbi__get8(s);
6893 
6894  if (stbi__at_eof(s) || *c != '#')
6895  break;
6896 
6897  while (!stbi__at_eof(s) && *c != '\n' && *c != '\r')
6898  *c = (char)stbi__get8(s);
6899  }
6900 }
6901 
6902 static int stbi__pnm_isdigit(char c)
6903 {
6904  return c >= '0' && c <= '9';
6905 }
6906 
6907 static int stbi__pnm_getinteger(stbi__context *s, char *c)
6908 {
6909  int value = 0;
6910 
6911  while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6912  value = value * 10 + (*c - '0');
6913  *c = (char)stbi__get8(s);
6914  }
6915 
6916  return value;
6917 }
6918 
6919 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6920 {
6921  int maxv;
6922  char c, p, t;
6923 
6924  stbi__rewind(s);
6925 
6926  // Get identifier
6927  p = (char)stbi__get8(s);
6928  t = (char)stbi__get8(s);
6929  if (p != 'P' || (t != '5' && t != '6')) {
6930  stbi__rewind(s);
6931  return 0;
6932  }
6933 
6934  *comp = (t == '6') ? 3 : 1; // '5' is 1-component .pgm; '6' is 3-component .ppm
6935 
6936  c = (char)stbi__get8(s);
6937  stbi__pnm_skip_whitespace(s, &c);
6938 
6939  *x = stbi__pnm_getinteger(s, &c); // read width
6940  stbi__pnm_skip_whitespace(s, &c);
6941 
6942  *y = stbi__pnm_getinteger(s, &c); // read height
6943  stbi__pnm_skip_whitespace(s, &c);
6944 
6945  maxv = stbi__pnm_getinteger(s, &c); // read max value
6946 
6947  if (maxv > 255)
6948  return stbi__err("max value > 255", "PPM image not 8-bit");
6949  else
6950  return 1;
6951 }
6952 #endif
6953 
6954 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6955 {
6956 #ifndef STBI_NO_JPEG
6957  if (stbi__jpeg_info(s, x, y, comp)) return 1;
6958 #endif
6959 
6960 #ifndef STBI_NO_PNG
6961  if (stbi__png_info(s, x, y, comp)) return 1;
6962 #endif
6963 
6964 #ifndef STBI_NO_GIF
6965  if (stbi__gif_info(s, x, y, comp)) return 1;
6966 #endif
6967 
6968 #ifndef STBI_NO_BMP
6969  if (stbi__bmp_info(s, x, y, comp)) return 1;
6970 #endif
6971 
6972 #ifndef STBI_NO_PSD
6973  if (stbi__psd_info(s, x, y, comp)) return 1;
6974 #endif
6975 
6976 #ifndef STBI_NO_PIC
6977  if (stbi__pic_info(s, x, y, comp)) return 1;
6978 #endif
6979 
6980 #ifndef STBI_NO_PNM
6981  if (stbi__pnm_info(s, x, y, comp)) return 1;
6982 #endif
6983 
6984 #ifndef STBI_NO_HDR
6985  if (stbi__hdr_info(s, x, y, comp)) return 1;
6986 #endif
6987 
6988  // test tga last because it's a crappy test!
6989 #ifndef STBI_NO_TGA
6990  if (stbi__tga_info(s, x, y, comp))
6991  return 1;
6992 #endif
6993  return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6994 }
6995 
6996 #ifndef STBI_NO_STDIO
6997 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6998 {
6999  FILE *f = stbi__fopen(filename, "rb");
7000  int result;
7001  if (!f) return stbi__err("can't fopen", "Unable to open file");
7002  result = stbi_info_from_file(f, x, y, comp);
7003  fclose(f);
7004  return result;
7005 }
7006 
7007 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
7008 {
7009  int r;
7010  stbi__context s;
7011  long pos = ftell(f);
7012  stbi__start_file(&s, f);
7013  r = stbi__info_main(&s, x, y, comp);
7014  fseek(f, pos, SEEK_SET);
7015  return r;
7016 }
7017 #endif // !STBI_NO_STDIO
7018 
7019 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
7020 {
7021  stbi__context s;
7022  stbi__start_mem(&s, buffer, len);
7023  return stbi__info_main(&s, x, y, comp);
7024 }
7025 
7026 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
7027 {
7028  stbi__context s;
7029  stbi__start_callbacks(&s, (stbi_io_callbacks *)c, user);
7030  return stbi__info_main(&s, x, y, comp);
7031 }
7032 
7033 #endif // STB_IMAGE_IMPLEMENTATION
7034 
7035 /*
7036 revision history:
7037 2.13 (2016-11-29) add 16-bit API, only supported for PNG right now
7038 2.12 (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
7039 2.11 (2016-04-02) allocate large structures on the stack
7040 remove white matting for transparent PSD
7041 fix reported channel count for PNG & BMP
7042 re-enable SSE2 in non-gcc 64-bit
7043 support RGB-formatted JPEG
7044 read 16-bit PNGs (only as 8-bit)
7045 2.10 (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
7046 2.09 (2016-01-16) allow comments in PNM files
7047 16-bit-per-pixel TGA (not bit-per-component)
7048 info() for TGA could break due to .hdr handling
7049 info() for BMP to shares code instead of sloppy parse
7050 can use STBI_REALLOC_SIZED if allocator doesn't support realloc
7051 code cleanup
7052 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
7053 2.07 (2015-09-13) fix compiler warnings
7054 partial animated GIF support
7055 limited 16-bpc PSD support
7056 #ifdef unused functions
7057 bug with < 92 byte PIC,PNM,HDR,TGA
7058 2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value
7059 2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning
7060 2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit
7061 2.03 (2015-04-12) extra corruption checking (mmozeiko)
7062 stbi_set_flip_vertically_on_load (nguillemot)
7063 fix NEON support; fix mingw support
7064 2.02 (2015-01-19) fix incorrect assert, fix warning
7065 2.01 (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
7066 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
7067 2.00 (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
7068 progressive JPEG (stb)
7069 PGM/PPM support (Ken Miller)
7070 STBI_MALLOC,STBI_REALLOC,STBI_FREE
7071 GIF bugfix -- seemingly never worked
7072 STBI_NO_*, STBI_ONLY_*
7073 1.48 (2014-12-14) fix incorrectly-named assert()
7074 1.47 (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
7075 optimize PNG (ryg)
7076 fix bug in interlaced PNG with user-specified channel count (stb)
7077 1.46 (2014-08-26)
7078 fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
7079 1.45 (2014-08-16)
7080 fix MSVC-ARM internal compiler error by wrapping malloc
7081 1.44 (2014-08-07)
7082 various warning fixes from Ronny Chevalier
7083 1.43 (2014-07-15)
7084 fix MSVC-only compiler problem in code changed in 1.42
7085 1.42 (2014-07-09)
7086 don't define _CRT_SECURE_NO_WARNINGS (affects user code)
7087 fixes to stbi__cleanup_jpeg path
7088 added STBI_ASSERT to avoid requiring assert.h
7089 1.41 (2014-06-25)
7090 fix search&replace from 1.36 that messed up comments/error messages
7091 1.40 (2014-06-22)
7092 fix gcc struct-initialization warning
7093 1.39 (2014-06-15)
7094 fix to TGA optimization when req_comp != number of components in TGA;
7095 fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7096 add support for BMP version 5 (more ignored fields)
7097 1.38 (2014-06-06)
7098 suppress MSVC warnings on integer casts truncating values
7099 fix accidental rename of 'skip' field of I/O
7100 1.37 (2014-06-04)
7101 remove duplicate typedef
7102 1.36 (2014-06-03)
7103 convert to header file single-file library
7104 if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7105 1.35 (2014-05-27)
7106 various warnings
7107 fix broken STBI_SIMD path
7108 fix bug where stbi_load_from_file no longer left file pointer in correct place
7109 fix broken non-easy path for 32-bit BMP (possibly never used)
7110 TGA optimization by Arseny Kapoulkine
7111 1.34 (unknown)
7112 use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
7113 1.33 (2011-07-14)
7114 make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
7115 1.32 (2011-07-13)
7116 support for "info" function for all supported filetypes (SpartanJ)
7117 1.31 (2011-06-20)
7118 a few more leak fixes, bug in PNG handling (SpartanJ)
7119 1.30 (2011-06-11)
7120 added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
7121 removed deprecated format-specific test/load functions
7122 removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
7123 error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7124 fix inefficiency in decoding 32-bit BMP (David Woo)
7125 1.29 (2010-08-16)
7126 various warning fixes from Aurelien Pocheville
7127 1.28 (2010-08-01)
7128 fix bug in GIF palette transparency (SpartanJ)
7129 1.27 (2010-08-01)
7130 cast-to-stbi_uc to fix warnings
7131 1.26 (2010-07-24)
7132 fix bug in file buffering for PNG reported by SpartanJ
7133 1.25 (2010-07-17)
7134 refix trans_data warning (Won Chun)
7135 1.24 (2010-07-12)
7136 perf improvements reading from files on platforms with lock-heavy fgetc()
7137 minor perf improvements for jpeg
7138 deprecated type-specific functions so we'll get feedback if they're needed
7139 attempt to fix trans_data warning (Won Chun)
7140 1.23 fixed bug in iPhone support
7141 1.22 (2010-07-10)
7142 removed image *writing* support
7143 stbi_info support from Jetro Lauha
7144 GIF support from Jean-Marc Lienher
7145 iPhone PNG-extensions from James Brown
7146 warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7147 1.21 fix use of 'stbi_uc' in header (reported by jon blow)
7148 1.20 added support for Softimage PIC, by Tom Seddon
7149 1.19 bug in interlaced PNG corruption check (found by ryg)
7150 1.18 (2008-08-02)
7151 fix a threading bug (local mutable static)
7152 1.17 support interlaced PNG
7153 1.16 major bugfix - stbi__convert_format converted one too many pixels
7154 1.15 initialize some fields for thread safety
7155 1.14 fix threadsafe conversion bug
7156 header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7157 1.13 threadsafe
7158 1.12 const qualifiers in the API
7159 1.11 Support installable IDCT, colorspace conversion routines
7160 1.10 Fixes for 64-bit (don't use "unsigned long")
7161 optimized upsampling by Fabian "ryg" Giesen
7162 1.09 Fix format-conversion for PSD code (bad global variables!)
7163 1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7164 1.07 attempt to fix C++ warning/errors again
7165 1.06 attempt to fix C++ warning/errors again
7166 1.05 fix TGA loading to return correct *comp and use good luminance calc
7167 1.04 default float alpha is 1, not 255; use 'void *' for stbi_image_free
7168 1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7169 1.02 support for (subset of) HDR files, float interface for preferred access to them
7170 1.01 fix bug: possible bug in handling right-side up bmps... not sure
7171 fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7172 1.00 interface to zlib that skips zlib header
7173 0.99 correct handling of alpha in palette
7174 0.98 TGA loader by lonesock; dynamically add loaders (untested)
7175 0.97 jpeg errors on too large a file; also catch another malloc failure
7176 0.96 fix detection of invalid v value - particleman@mollyrocket forum
7177 0.95 during header scan, seek to markers in case of padding
7178 0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7179 0.93 handle jpegtran output; verbose errors
7180 0.92 read 4,8,16,24,32-bit BMP files of several formats
7181 0.91 output 24-bit Windows 3.0 BMP files
7182 0.90 fix a few more warnings; bump version number to approach 1.0
7183 0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd
7184 0.60 fix compiling as c++
7185 0.59 fix warnings: merge Dave Moore's -Wall fixes
7186 0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian
7187 0.57 fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7188 0.56 fix bug: zlib uncompressed mode len vs. nlen
7189 0.55 fix bug: restart_interval not initialized to 0
7190 0.54 allow NULL for 'int *comp'
7191 0.53 fix bug in png 3->4; speedup png decoding
7192 0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7193 0.51 obey req_comp requests, 1-component jpegs return as 1-component,
7194 on 'test' only check type, not whether we support this variant
7195 0.50 (2006-11-19)
7196 first released version
7197 */