API Reference

range_streams provides file-like object handling through an API familiar to users of the standard library io module. It uses Range, RangeSet, and RangeDict classes (from the externally maintained python-ranges library) to represent and look up range operations in an efficient linked list data structure.

Servers with support for HTTP range requests can provide partial content requests, avoiding the need to download and consume linearly from the start of a file when streaming, or without needing to download the entire file (non-streaming requests).

A RangeStream is initialised by providing:

  • a URL (the file to be streamed)

  • (optionally) a client (httpx.Client), or else a fresh one is created

  • (optionally) a range, as either: Range from the python-ranges package [recommended]; or a tuple of integers, presumed to be a half-open interval inclusive of start/exclusive of stop as is common practice in Python — [start, stop) in interval notation.

If no range (or the empty range) is given, a HTTP HEAD request will be sent instead of a GET request, to check the total length of the file being streamed. Either way therefore determines the total file length upon initialisation (total_bytes, also available as the range spanning the entire file total_range).

The following example shows the basic setup for a single range.

>>> from ranges import Range
>>> from range_streams import RangeStream, _EXAMPLE_URL
>>> s = RangeStream(url=_EXAMPLE_URL) 
>>> rng = Range(0,3) 
>>> s.add(rng) 
>>> s.ranges 
RangeDict{RangeSet{Range[0, 3)}: RangeResponse ⠶ [0, 3) @ 'example_text_file.txt' from raw.githubusercontent.com}

Once a request is made for a non-empty range, the RangeStream acquires the first entry in the RangeDict stored on the ranges attribute. This gates access to the internal _ranges attribute RangeDict), which takes into account whether the bytes in each range’s RangeResponse are exhausted or removed due to overlap with another range. See the docs for further details.

Further ranges are requested by simply calling the add() method with another Range object. To create this implicitly, you can simply provide a byte range to the add method as a tuple of two integers, which will be interpreted per the usual convention for ranges in Python, as an [a,b) half-open interval.

>>> s.add(byte_range=(7,9)) 
>>> s.ranges 
RangeDict{
  RangeSet{Range[0, 3)}: RangeResponse ⠶ [0, 3) @ 'example_text_file.txt' from raw.githubusercontent.com,
  RangeSet{Range[7, 9)}: RangeResponse ⠶ [7, 9) @ 'example_text_file.txt' from raw.githubusercontent.com
}

Codecs are available for .zip (ZipStream) and .conda (CondaStream) archives, which will read and name the ranges corresponding to the archive’s contents file list upon initialisation.

>>> from range_streams import _EXAMPLE_ZIP_URL
>>> from range_streams.codecs import ZipStream
>>> s = ZipStream(url=_EXAMPLE_ZIP_URL) 
>>> s.ranges 
RangeDict{
  RangeSet{Range[51, 62)}: RangeResponse ⠶ "example_text_file.txt" [51, 62) @ 'example_text_file.txt.zip' from raw.githubusercontent.com
}

The .conda format is just a particular type of zip for Python packages on the conda package manager (containing JSON and Zstandard-compressed tarballs):

>>> from range_streams.codecs import CondaStream
>>> EXAMPLE_CONDA_URL = "https://repo.anaconda.com/pkgs/main/linux-64/progressbar2-3.34.3-py27h93d0879_0.conda" 
>>> s = CondaStream(url=EXAMPLE_CONDA_URL) 
>>> s.ranges 
RangeDict{
  RangeSet{Range[77, 6427)}: RangeResponse ⠶ "info-progressbar2-3.34.3-py27h93d0879_0.tar.zst" [77, 6427) @ 'progressbar2-3.34.3-py27h93d0879_0.conda' from repo.anaconda.com
  RangeSet{Range[6503, 39968)}: RangeResponse ⠶ "pkg-progressbar2-3.34.3-py27h93d0879_0.tar.zst" [6503, 39968) @ 'progressbar2-3.34.3-py27h93d0879_0.conda' from repo.anaconda.com
  RangeSet{Range[40011, 40042)}: RangeResponse ⠶ "metadata.json" [40011, 40042) @ 'progressbar2-3.34.3-py27h93d0879_0.conda' from repo.anaconda.com
}

A further codec handles PNG images (a file format composed of ‘chunks’ of different types). The metadata can be identified from looking in the IHDR chunk and checking for the presence of other chunks. Some properties are made available ‘as direct’ (i.e. reliably, regardless of the specific PNG compression) mimicking the approach of the PyPNG library.

>>> from range_streams import _EXAMPLE_PNG_URL
>>> from range_streams.codecs import PngStream
>>> s = PngStream(url=_EXAMPLE_PNG_URL) 
>>> s.alpha_as_direct 
True
>>> s.channel_count_as_direct 
4
>>> s.chunks 
{'IHDR': [PngChunkInfo :: {'data_range': Range[16, 29), 'end': 33, 'length': 13, 'start': 8, 'type': 'IHDR'}],
 'zTXt': [PngChunkInfo :: {'data_range': Range[41, 1887), 'end': 1891, 'length': 1846, 'start': 33, 'type': 'zTXt'}],
 'iCCP': [PngChunkInfo :: {'data_range': Range[1899, 2287), 'end': 2291, 'length': 388, 'start': 1891, 'type': 'iCCP'}],
 'bKGD': [PngChunkInfo :: {'data_range': Range[2299, 2305), 'end': 2309, 'length': 6, 'start': 2291, 'type': 'bKGD'}],
 'pHYs': [PngChunkInfo :: {'data_range': Range[2317, 2326), 'end': 2330, 'length': 9, 'start': 2309, 'type': 'pHYs'}],
 'tIME': [PngChunkInfo :: {'data_range': Range[2338, 2345), 'end': 2349, 'length': 7, 'start': 2330, 'type': 'tIME'}],
 'tEXt': [PngChunkInfo :: {'data_range': Range[2357, 2382), 'end': 2386, 'length': 25, 'start': 2349, 'type': 'tEXt'}],
 'IDAT': [PngChunkInfo :: {'data_range': Range[2394, 5108), 'end': 5112, 'length': 2714, 'start': 2386, 'type': 'IDAT'}],
 'IEND': [PngChunkInfo :: {'data_range': Range[5120, 5120), 'end': 5124, 'length': 0, 'start': 5112, 'type': 'IEND'}]}
>>> s.data.IHDR 
IHDRChunk :: {'bit_depth': 8, 'channel_count': 4, 'colour_type': 6, 'compression': 0, 'end_pos': 29, 'filter_method': 0, 'height': 100, 'interlacing': 0, 'start_pos': 16, 'struct': '>IIBBBBB', 'width': 100}
>>> s.get_idat_data()[:4] 
[153, 0, 0, 255]

Range streams

This class represents a file being streamed as a sequence of non-overlapping ranges.


range_streams.stream exposes a class RangeStream, whose key property (once initialised) is ranges, which provides a RangeDict comprising the ranges of the file being streamed.

The method add() will request further ranges, and (unlike the other methods in this module) will accept a tuple of two integers as its argument (byte_range).

class range_streams.stream.RangeStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]

Bases: object

A class representing a file being streamed from a server which supports range requests, with the ranges property providing a list of those intervals requested so far (and not yet exhausted).

When the class is initialised its length checked upon the first range request, and the client provided is not closed (you must handle this yourself). Further ranges may be requested on the RangeStream by calling add().

Both the __init__() and add() methods support the specification of a range interval as either a tuple of two integers or a Range from the python-ranges package (an external requirement installed alongside this package). Either way, the interval created is interpreted to be the standard Python convention of a half-open interval [start,stop).

Don’t forget to close the httpx.Response yourself! The close() method is available (or close()) to help you.

__init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]

Set up a stream for the file at url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default: Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on the total_bytes property.

By default (if client is left as None) a fresh httpx.Client will be created for each stream.

The byte_range can be specified as either a Range object, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval [start, end), as given by Python’s built-in range.

If byte_range is passed as the empty range Range(0,0) (its default), then a HEAD request is sent to url on initialisation, setting the total_bytes value from the content-length header in the subsequent response.

If single_request is True (default: False), then the behaviour when an empty byte_range is passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each time add() was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly.

Note: internally, this single request is known as ‘the monostream’, and is stored on the monostream property.

Note: a single request will not be as efficient if streaming the response non-linearly (since reading a byte in the stream requires loading all bytes up to it). This will mean it is only performant to use for certain file types or applications (e.g. a ZIP file is read “in a principled manner” from the end [the Central Directory] first, so gains greatly from using multiple partial content requests rather than a single stream, whereas a PNG file can only be read “in a principled manner” linearly, iterating through the chunks from the start).

The pruning_level controls the policy for overlap handling (0 will resize overlapped ranges, 1 will delete overlapped ranges, and 2 will raise an error when a new range is added which overlaps a pre-existing range).

The chunk_size controls the size of the chunks that are read in from the httpx.Response.iter_raw iterator on the streamed HTTP response.

Parameters:
  • url (str) – (str) The URL of the file to be streamed

  • client – (httpx.Client | None) The HTTPX client to use for HTTP requests

  • byte_range (Range | tuple[int, int]) – (Range | tuple[int,int]) The range of positions on the file to be requested

  • pruning_level (int) – (int) Either 0 (‘replant’), 1 (‘burn’), or 2 (‘strict’)

  • single_request (bool) – (bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.

  • force_async (bool) – (bool | None) Whether to require the client to be httpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)

  • chunk_size (int | None) – (int | None) The chunk size used for the httpx.Response.iter_raw response byte iterators

  • raise_response (bool) – (bool) Whether to raise HTTP status code exceptions

__ranges_repr__()[source]
Return type:

str

_active_range: Range | None = None

Set by set_active_range(), through which the active_range_response property gives access to the currently ‘active’ range (usually the most recently created).

_ranges: RangeDict

‘Internal’ ranges attribute. Start position is not affected by reading in bytes from the RangeResponse (unlike the ‘external’ ranges property)

async aclose()[source]

Close any httpx.Response on the async stream. In single request mode, there is just the one (shared with all the ‘windowed’ responses).

Return type:

None

property active_range_response: RangeResponse

Look up the RangeResponse object associated with the currently active range by using _active_range as the Range key for the internal _ranges RangeDict.

Look it up in the _ranges RangeDict instead if in single request mode.

add(byte_range=Range[0, 0), activate=True, name='')[source]

Add a range to the stream. If it is empty and the length of the stream has not already been determined, this will initiate a HEAD request to check the file’s total size. In all other cases, only add the Range to the RangeDict of ranges, set up a streaming partial content GET request, but do not try to read any bytes from it (so response data will be downloaded upon creation).

The byte_range can be specified as either a Range object, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval [start, end), as given by Python’s built-in range.

If activate is True, make this range the active range upon adding it to the stream (allowing access to the associated response through the active_range_response property).

If a name is provided (used in subclasses where the stream is an archive with individually named files within it), assign this name to the RangeResponse (as its range_name argument).

Parameters:
  • byte_range (Range | tuple[int, int]) – (Range | tuple[int,int]) The range of positions on the file to be requested and stored in the RangeDict on ranges

  • activate (bool) – (bool) Whether to make this newly added Range the active range on the stream upon creating it.

  • name (str) – (str) A name (default: '') to give to the range.

Return type:

None

async add_async(byte_range=Range[0, 0), activate=True, name='')[source]
Return type:

None

add_window(byte_range=Range[0, 0), activate=True, name='')[source]

Register a window onto the original range in the _ranges RangeDict rather than add a new range entry to the dict (which would A) clash with the single entire range B) require another request

Parameters:
  • byte_range (Range | tuple[int, int]) – (Range | tuple[int,int]) The range of positions on the file to be read from the request on the stream.

  • activate (bool) – (bool) Whether to make this newly added Range the active range on the stream upon creating it.

  • name (str) – (str) A name (default: '') to give to the range.

Return type:

None

burn_range(overlapped_ext_rng)[source]

Get the internal range (i.e. without offsets applied from the current read position on the range) from the external one (which may differ if the seek position has advanced from the start position, usually due to reading bytes from the range). Once this internal range has been identified, delete it, and set the _active_range to the most recent (or if the stream becomes empty, set it to None).

Parameters:

overlapped_ext_rng (Range) – the overlapped external range

check_is_subrange(rng)[source]
check_range_integrity(use_windows=False)[source]

Every RangeSet in the _ranges RangeDict keys must contain 1 Range each

Return type:

None

check_response_length(headers, req)[source]

Return the length of the response from its content-length header (after checking it contains this header, else raising KeyError), as an integer.

Parameters:
  • headers (dict[str, str]) – The response headers

  • req (str) – The request method (to be reported in any KeyError raised)

Return type:

int

property client_is_async
close()[source]

Close any httpx.Response on the stream. In single request mode, there is just the one (shared with all the ‘windowed’ responses).

Return type:

None

compute_external_ranges(use_windows=False)[source]

If use_windows is True, the internal_range_dict is _range_windows rather than _ranges when use_windows is False (default: False).

Modifying the internal_range_dict attribute to account for the bytes consumed (from the head) and tail mark offset of where a range was already trimmed to avoid an overlap (from the tail).

While the RangeSet keys are a deep copy of the internal_range_dict RangeDict keys (and therefore will not propagate if modified), the RangeResponse values are references, therefore will propagate to the internal_range_dict RangeDict if modified (primarily when read).

When use_windows is True, these RangeResponse values are ‘simulations’ (a.k.a. mock/dummy objects) of the range response that would be received from a partial content request (they in fact merely came from a streamed GET request).

Return type:

RangeDict

property domain: str
ext2int(ext_rng)[source]

Given the external range ext_rng and the RangeStream on which it is ‘stored’ (or rather, computed, in the ranges property), return the internal Range stored on the _ranges attribute of the RangeStream, by looking up the shared RangeResponse value.

Parameters:

ext_rng (Range) – A Range from the ‘external’ ranges with which to cross-reference in _ranges to identify the corresponding ‘internal’ range.

Return type:

RangeResponse

property freely_requestable

Trivial opposite of the single_request attribute, so that conditional blocks can treat this as the ‘conventional’ case and the single request case be the alternative (which looks better).

async get_async_monostream()[source]

Send a streaming GET request with an open-ended content-range header, to obtain the total range. Suitable for higher performance (to avoid repeated requests on the RangeStream which accrue a time cost).

Should be called after the RangeStream is initialised (with both single_request and force_async as True), and [unlike the initialisation method] of course this method must be awaited.

Return type:

None

get_monostream()[source]

Send a streaming GET request with an open-ended content-range header, to obtain the total range. Suitable for higher performance (to avoid repeated requests on the RangeStream which accrue a time cost).

Called at initialisation (within the first) when single_request is passed to RangeStream as True.

Return type:

None

handle_overlap(rng, internal=False, use_windows=False)[source]

Handle overlaps with a given pruning level: :rtype: None

  1. “replant” ranges overlapped at the head with fresh, disjoint ranges ‘downstream’ or mark their tails to effectively truncate them if overlapped at the tail

  2. “burn” existing ranges overlapped anywhere by the new range

  3. “strict” will throw a ValueError

property is_closed

True if the httpx.Response object(s) associated with the RangeResponse values in the internal _ranges RangeDict is/are all closed.

isempty()[source]

Whether the internal _ranges RangeDict is empty (contains no range-RangeResponse key-value pairs).

Return type:

bool

list_ranges()[source]

Retrieve ascending order list of RangeSet keys, as a list of Range.

The RangeSet to Range transformation is permitted because the ranges property method begins by checking range integrity, which requires each RangeSet to be a singleton set (of a single Range).

If activate is True (the default), the range will be made the active range of the RangeStream upon being registered (if it meets the criteria for registration).

If pruning_level is 0 then overlaps are handled using a “replant” policy (redefine and overwrite the existing range to be disjoint when the new range would overlap it), if it’s 1 they are handled with a “burn” policy (simply dispose of the existing range to eliminate any potential overlap), and if it’s 2 using a “strict” policy (raising errors upon detecting overlap).

Return type:

list[Range]

classmethod make_async_fetcher(urls, callback=None, verbose=False, show_progress_bar=True, timeout_s=5.0, client=None, close_client=False, **kwargs)[source]
property name: str
overlap_whence(rng, internal=False, use_windows=False)[source]
Return type:

int | None

property ranges

Read-only view on the RangeDict stored in the _ranges attribute, modifying it to account for the bytes consumed (from the head) and tail mark offset of where a range was already trimmed to avoid an overlap (from the tail).

Each ranges RangeDict key is a RangeSet containing 1 Range. Check this assumption (singleton RangeSet “integrity”) holds and retrieve this list of RangeSet keys in ascending order, as a list of Range.

Requests are restricted to not re-request already-requested file ranges, so give windows onto the underlying range that can be consumed (but the underlying :class:~range_streams.response.RangeResponse` will persist and cannot be consumed by reading).

read(size=None)[source]
Return type:

bytes

register_range(rng, value, activate=True, use_windows=False)[source]
seek(position, whence=0)[source]
Return type:

None

send_head_request()[source]

Send a ‘plain’ HEAD request without range headers, to check the total content length without creating a RangeRequest (simply discard the response as it can only be associated with the empty range, which cannot be stored in a RangeDict), raising for status ASAP. To be used when initialised with an empty byte range. If the range_streams.stream.RangeStream.client is asynchronous, use a synchronous client (created for this single request).

Return type:

None

send_request(byte_range)[source]
Return type:

RangeRequest

set_active_range(rng)[source]

Setter for the active range (through which active_range_response is also set).

set_client(client, force_async)[source]

Check client type explicitly to handle a/sync and optional HTTPX client.

Parameters:
  • client – (httpx.Client | class:httpx.AsyncClient | None) The client to be used for all HTTP requests made on the range_streams.stream.RangeStream. If None, a fresh one will be created.

  • force_async (bool) – (bool) If the client is None, this parameter determines whether httpx.Client or class:httpx.AsyncClient is set as the client. If a synchronous client is given and force_async is True, an error will be raised.

Return type:

None

set_length(length)[source]
Return type:

None

simulate_request(byte_range, parent_range_request=None)[source]

Simulate the RangeRequest obtained from a partial content request for byte_range on the stream’s URL through a “window” on range_request (expected to be a streamed GET request for the full file range).

If no parent_range_request is provided, it is assumed to be the one on the RangeResponse in the internal _ranges RangeDict

Parameters:
  • byte_range (Range) – The Range to simulate a partial content request for.

  • parent_range_request (RangeRequest | None) – The RangeRequest over which to use a “window” to simulate the range request.

Return type:

RangeRequest

property spanning_range: Range
property sync_client

either the stream’s client, or a fresh one if the stream’s client is asynchronous. Used for HEAD requests on an async RangeStream. Presumes a client has been set correctly.

Type:

Provide a synchronous client

tell()[source]
Return type:

int

property total_bytes: int | None

The total number of bytes (i.e. the length) of the file being streamed.

property total_range: Range

HTTP request helper functions

These helper functions help prepare HTTP requests to set up a stream.


When preparing a HTTP GET request, the HTTP range request header must be provided as a dict, for example:

{"range": "bytes=0-1"}

would request the two bytes at positions 0 and 1 (i.e. the inclusive interval [0,1]).

An empty range can also be specified with the value bytes="-0", which is useful to determine the total length of a file (as the Content-Range header returned by the server contains the total size of the file from which the range was taken).

exception range_streams.http_utils.PartialContentStatusError(*, request, response)[source]

Bases: Exception

The response had any HTTP status code other than 206 (Partial Content).

May be raised when calling raise_for_non_partial_content()

range_streams.http_utils.byte_range_from_range_obj(rng)[source]

Prepare the byte range substring for a HTTP range request.

For example:

>>> from range_streams.http_utils import byte_range_from_range_obj
>>> byte_range_from_range_obj(Range(0,2))
'0-1'
Parameters:

rng (Range) – range of the bytes to be requested (0-based)

Return type:

str

Returns:

A hyphen-separated string of start and end positions. The start position is missing if the range provided is empty, and this corresponds to a request for “the last zero bytes” i.e. an empty range request.

range_streams.http_utils.detect_header_value(headers, key, source='Response')[source]

Detect a title case, lower case, or capitalised version of the given string.

range_streams.http_utils.range_header(rng)[source]

Prepare a dict to pass as a httpx request header with a single key ranges whose value is the byte range.

For example:

>>> from range_streams.http_utils import range_header
>>> range_header(Range(0,2))
{'range': 'bytes=0-1'}
>>> range_header(Range(0,0))
{'range': 'bytes=0-'}
Parameters:

rng (Range) – range of the bytes to be requested (0-based)

Return type:

dict[str, str]

Returns:

dict suitable to be passed to httpx.Client.build_request in setup_stream() through range_header

Asynchronous fetcher

This helper class handles all of the details of asynchronously fetching streams, given a list of URLs.


class range_streams.async_utils.AsyncFetcher(stream_cls, urls, callback=None, verbose=False, show_progress_bar=True, timeout_s=5.0, client=None, close_client=False, **kwargs)[source]

Bases: object

async async_fetch_urlset(urls)[source]

If the client is None, create one in a contextmanager block (i.e. close it immediately after use), otherwise use the one provided, not in a contextmanager block (i.e. leave it up to the user to close the client).

Parameters:

urls (Iterator[str]) – The URLs to fetch, as an exhaustible iterator (not a Sequence)

Return type:

Coroutine

complete_row(row_index)[source]

Add the range corresponding to the range at row row_index to the completed RangeSet, meaning it will be omitted on any further call to make_calls(). This should be done to indicate the URL at that row has been processed (either successfully or unsuccessfully, e.g. it gave a 404).

Return type:

None

async fetch(client, url)[source]
Parameters:
  • clienthttpx.AsyncClient

  • urlhttpx.URL

Return type:

TypeVar(_T, bound= range_streams.stream.RangeStream)

async fetch_and_process(urls, client)[source]
fetch_things(urls)[source]
property filtered_url_list: list[str]
immediate_exit(signal_enum, loop)[source]
Return type:

None

make_calls()[source]

The method called to run the event loop to fetch URLs, after initialisation and/or repeatedly upon exitting the loop (i.e. it can recover from errors).

mark_url_complete(url)[source]

Add the row index for the given URL in the url_list to the completed RangeSet, meaning it will be omitted on any further call to make_calls(). This should be done to indicate the URL has been processed (either successfully or unsuccessfully, e.g. it gave a 404).

Return type:

None

async process_stream(range_stream)[source]

Process an awaited RangeStream within an async fetch loop, calling the callback set on the callback attribute.

Parameters:

range_stream (TypeVar(_T, bound= range_streams.stream.RangeStream)) – The awaited RangeStream (or one of its subclasses)

async set_async_signal_handlers()[source]
Return type:

None

set_up_progress_bar()[source]
property total_complete: int
exception range_streams.async_utils.SignalHaltError(signal_enum)[source]

Bases: SystemExit

property exit_code: int

Overlap handling

These helper functions report on/handle the various possible ways ranges can overlap, and the actions taken if an overlap is found.


range_streams.overlaps.get_range_containing(rng_dict, position)[source]

Get a Range from rng_dict by looking up the position it contains, where rng_dict is either the internal RangeStream._ranges attribute or the external ranges property.

Presumes range integrity has been checked.

Raises ValueError if position is not in rng_dict.

Parameters:
  • rng_dict (RangeDict) – input range

  • position (int) – the position at which to look up

Return type:

Range

range_streams.overlaps.overlap_whence(rng_dict, rng)[source]

Determine if any overlap exists, whence (i.e. from where) on the pre-existing range it overlapped. 0 if the new range overlapped at the start (‘head’) of the existing range, 1 if fully contained (in the ‘body’), 2 if at the end (‘tail’), or None if the range is non-overlapping with any pre-existing range.

Note: same convention as Python io module’s SEEK_SET, SEEK_CUR, and SEEK_END.

Return type:

int | None

Requests and responses

These classes facilitate the streaming of data from a URL, and handling the response as a file-like object.


class range_streams.request.RangeRequest(byte_range, url, client, GET_got=None, window_on_range=Range[0, 0), chunk_size=None)[source]

Bases: object

Store a GET request and the response stream while keeping a reference to the client that spawned it, providing an overridable _iterator attribute [by default giving access to iter_raw()] on the underlying httpx.Response, suitable for RangeResponse to wrap in a io.BytesIO buffered stream. For async clients, _aiterator is set instead [giving access to aiter_raw()] on the

async aiter_raw()[source]

Wrap the iter_raw() method of the underlying httpx.Response object within the RangeResponse in response.

Return type:

AsyncIterator[bytes]

property aiterator_initialised
async await_aiterator()[source]

Initialise the async iterator on the _aiterator attribute from the stored function which when called returns the typing.AsyncIterator[bytes].

Return type:

None

check_client()[source]

Type checking workaround (Sphinx type hint extension does not like httpx so check the type manually with a method called at initialisation).

property client_is_async
close()[source]

Close the response RangeResponse.

Return type:

None

content_range_header()[source]

Validate request was range request by presence of content-range header

Return type:

str

classmethod from_get_stream(byte_range, client, req, resp, chunk_size=None)[source]

Avoid making a new partial content request, instead interpret a streaming GET request as one when provided along with a byte_range.

Does not call raise_for_non_partial_content() as is done after setting the request and response in setup_stream().

Note: req and resp are type checked ‘manually’ at init (not via type hints) due to Sphinx type hints bug with the httpx library.

Parameters:
  • byte_range (Range) – The Range provided by this request.

  • req – The sent httpx.Request

  • resp – The received httpx.Response

  • chunk_size (int | None) – The size of chunks to read the response into the buffer with

Return type:

RangeRequest

iter_raw()[source]

Wrap the iter_raw() method of the underlying httpx.Response object within the RangeResponse in response.

Return type:

Iterator[bytes]

raise_for_non_partial_content()[source]

Raise the PartialContentStatusError if the response status code is anything other than 206 (Partial Content), as that is what was requested.

property range_header
setup_stream()[source]

client.stream("GET", url) but leave the stream to be manually closed rather than using a context manager

Return type:

None

property total_content_length: int

Obtain the total content length from the content-range header of a partial content HTTP GET request. This method is not used for the HTTP HEAD request sent when a RangeStream is initialised with an empty Range (since that is not a partial content request it returns a content-length header which can be read as an integer directly).

classmethod windowed_request(byte_range, range_request, tail_mark, chunk_size)[source]

Reuse the stream from an existing streaming request rather to create a new ‘windowed’ RangeRequest from an existing RangeRequest, but change the byte range to be used on it. If the existing RangeRequest (range_request) is anything other than a stream of the full file range, then relative ranges will need to be calculated. This constructor was written on the assumption of a full file range.

Parameters:
  • byte_range (Range) – The Range provided by this request.

  • on_request – The sent httpx.Request

  • tail_mark (int) – The tail_mark to trim the byte_range (if any). Passed separately

  • chunk_size (int | None) – The chunk size to the httpx.Response.iter_raw iterator (or httpx.Response.aiter_raw if using an async client)

Return type:

RangeRequest

class range_streams.response.RangeResponse(stream, range_request, range_name='')[source]

Bases: object

Adapted from obskyr’s ResponseStream demo code, this class handles the streamed partial request as a file-like object.

Don’t forget to close the httpx.Response yourself! The close() method is available (or close()) to help you.

async aclose()[source]

Close the associated httpx.Response object. In single request mode, there is just the one (shared with all the ‘windowed’ responses).

async aread(size=None)[source]

File-like reading within the range request stream, with careful handling of windowed ranges and tail marks.

Return type:

bytes

buf_keep()[source]

If the currently set active buffer range on the _bytes buffer is not the range on this RangeResponse, then set it to be.

This is the mechanism by which windowed ranges are switched (the windows share the same ‘source’ buffer, and the value of the active buffer range stored on that buffer indicates the most recently active window).

At initialisation, all RangeResponse have their active buffer range set to the empty range, Range(0,0).

Return type:

None

check_is_windowed()[source]

Whether the associated request is windowed. Used to set is_windowed on init

Return type:

bool

property client

The request’s client.

close()[source]

Close the associated httpx.Response object. In single request mode, there is just the one (shared with all the ‘windowed’ responses).

property is_active_buf_range: bool

The active range is stored on the buffer the HTTP response stream writes to (in the active_buf_range attribute) so that whenever the active range changes, it is detectable immediately (all interfaces to read/seek/load the buffer are ‘guarded’ by a call to buf_keep() to achieve this).

When this change is detected, since the cursor may be in another range of the shared source buffer (where the previously active window was busy doing its thing), the cursor is first moved to the last stored tell() position, which is stored on each RangeResponse in the told attribute, and initialised as 0 so that on first use it simply refers to the start position of the window range.

Note that the active range only changes for ‘windowed’ RangeResponse objects sharing a ‘source’ buffer with a source _ranges RangeDict. To clarify: the active range changes on first use for non-windowed ranges, since the active range is initialised as the empty range (but after that it doesn’t!)

property is_closed

True if the associated httpx.Response object is closed. For a windowed response in single request mode, this will be shared with any/all other windowed responses on the stream.

is_consumed()[source]

Whether the tell() position (indicating ‘consumed’ or ‘read so far’) along with the tail_mark indicates whether the stream should be considered consumed.

The tail_mark is part of a mechanism to ‘shorten’ ranges when an overlap is detected, to preserve the one-to-one integrity of the RangeDict (see notes on the “replant” policy of handle_overlap(), set by the pruning_level passed into RangeStream on initialisation).

Note that there is (absolutely!) nothing stopping a stream from being re-consumed, but this library works on the assumption that all streams will be handled in an efficient manner (with any data read out from them either used once only or else will be reused from the first output rather than re-accessed directly from the stream itself).

To this end, RangeStream has measures in place to “decommission” ranges once they are consumed (see in particular burn_range() and handle_overlap()).

Return type:

bool

property is_in_window: bool

Whether file cursor is in the window. Trivially true for a non-windowed request, otherwise checks if the file cursor is currently within (or exactly at the end of) the window range.

property name: str

A wrapper to access the name of the ‘parent’ RangeStream.

prepare_reading_window()[source]

Prepare the stream cursor for reading (unclear if this should only be done on initialisation…) Should be done every time if the cursor is shared, but is it?

Return type:

None

read(size=None)[source]

File-like reading within the range request stream, with careful handling of windowed ranges and tail marks.

Return type:

bytes

seek(position, whence=0)[source]

File-like seeking within the range request stream. Synchronous only.

set_active_buf_range(rng)[source]

Update the _bytes buffer’s active_buf_range attribute with the given :~ranges.Range` (rng).

Return type:

None

property source_aiterator

The async iterator associated with the source range, for a windowed range.

property source_iterator

The iterator associated with the source range, for a windowed range.

property source_range: Range

Wrapper for window_on_range with a less confusing name to access. Note that this will be the empty range if the request is not a windowed request.

property source_range_response: RangeResponse

The RangeResponse associated with the source range, for a windowed range. Only access this if windowed (if not a windowed range, this will give the RangeResponse associated with the range at position 0, as the default window_on_range value for non-windowed ranges is the empty range [0,0), whose start will be used as the key for the _ranges RangeDict).

store_tell()[source]

Store the [window-relative] tell value in told in the event of any read, seek, or load on the stream, when accessed through the RangeResponse (do not access directly if you want to keep a reliable stored value for told).

Return type:

None

tail_mark: int = 0

The amount by which to shorten the ‘tail’ (i.e. the upper end) of the range when deciding if it is ‘consumed’. Incremented within the handle_overlap() method when the pruning_level is set to 1 (indicating a “replant” policy).

Under a ‘replant’ policy, when a new range is to be added and would overlap at the tail of an existing range, the pre-existing range should be effectively truncated by ‘marking their tails’ (where an existing range is assumed here to only be considered a range if it is not ‘consumed’ yet).

tell()[source]

File-like tell (position indicator) within the range request stream.

Return type:

int

tell_abs(live=True)[source]

Get the absolute file cursor position from either the active range response tell (if live is True: default) or the position stored on the active range response (if live is False).

Both are given as absolute positions by adding the window_offset, (which is 0 for non-windowed ranges).

Return type:

int

property total_len_to_read
property url: str

A wrapper to access the url of the ‘parent’ RangeStream.

verify_async(msg='')[source]
verify_sync(msg='')[source]
property window_offset: int

Range operations

These tools perform transformations on, or output particular information from, the data structures which store ranges.


range_streams.range_utils.most_recent_range(stream, internal=True)[source]

For all of the RangeResponse values in the RangeDict, list the ranges from their original request in order of registration.

If internal is True, use _ranges as the RangeDict, else use the ‘external’ (computed) property ranges. The external ones take into account the position the file has been read/seeked to.

Parameters:
Return type:

Range | None

range_streams.range_utils.range_len(rng)[source]

Get the length of a Range.

Parameters:

rng (Range) – A Range (which by default will be half-closed, i.e. not inclusive of the end position).

Return type:

int

range_streams.range_utils.range_max(rng)[source]

Get the maximum (or end terminus) of a Range.

Parameters:

rng (Range) – A Range (which by default will be half-closed, i.e. not inclusive of the end position).

Return type:

int

range_streams.range_utils.range_min(rng)[source]

Get the minimum (or start terminus) of a Range.

Parameters:

rng (Range) – A Range (which by default will be half-closed, i.e. not inclusive of the end position).

Return type:

int

range_streams.range_utils.range_span(ranges)[source]

Given a list of Range, calculate their ‘span’ (i.e. the range spanned from their minimum to maximum). This span may of course not be completely ‘covered’ by the ranges in the list.

Assumes input list of RangeSet are in ascending order, switches if not.

Parameters:

ranges (list[Range]) – A list of ranges whose span is to be given.

Return type:

Range

range_streams.range_utils.range_termini(rng)[source]

Get the inclusive start and end positions [start,end] from a ranges.Range. These are referred to as the ‘termini’. Ranges are always ascending.

Parameters:

rng (Range) – A Range (which by default will be half-closed, i.e. not inclusive of the end position).

Return type:

tuple[int, int]

range_streams.range_utils.ranges_in_reg_order(ranges)[source]

Given a RangeDict, list the ranges in order of registration.

Presumes integrity is already checked.

Parameters:

ranges (RangeDict) – Either the internal or external ranges of a RangeStream.

Return type:

list[Range]

range_streams.range_utils.response_ranges_in_reg_order(ranges)[source]

For all of the :class:~range_streams.response.RangeResponse` values in the RangeDict, list the ranges from their original :attribute:~range_streams.response.RangeResponse.request` in order of registration.

Parameters:

ranges (RangeDict) – Either the internal or external ranges of a RangeStream.

Return type:

list[Range]

range_streams.range_utils.validate_range(byte_range, allow_empty=True)[source]

Validate byte_range and convert to a half-closed (i.e. not inclusive of the end position) [start,end) Range if given as integer tuple.

Parameters:

byte_range (Range | tuple[int, int]) – Either a tuple of two int positions with which to create a Range (which by default will be half-closed, i.e. not inclusive of the end position); or simply a Range.

Return type:

Range

Streaming codecs

Codecs for PNG, ZIP, and .conda, and TAR formats to assist in handling these file types in regard to the information in header sections defined in their specifications.


class range_streams.codecs.zip.ZipStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]

Bases: RangeStream

As for RangeStream, but if scan_contents is True, then immediately call check_central_dir_rec() on initialisation (which will perform a series of range requests to identify the files in the zip from the End of Central Directory Record and Central Directory Record), setting zipped_files, and add() their file content ranges to the stream.

Setting this can be postponed until first access of the filename_list property (this will not add() them to the ZipStream).

Once parsed, the file contents are stored as a list of ZippedFileInfo objects (in the order they appear in the Central Directory Record) in the zipped_files attribute. Each of these objects has a file_range() method which gives the range of its file content bytes within the ZipStream.

__init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]

Set up a stream for the ZIP archive at url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default: Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on the total_bytes property.

By default (if client is left as None) a fresh httpx.Client will be created for each stream.

The byte_range can be specified as either a Range object, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval [start, end), as given by Python’s built-in range.

The pruning_level controls the policy for overlap handling (0 will resize overlapped ranges, 1 will delete overlapped ranges, and 2 will raise an error when a new range is added which overlaps a pre-existing range).

If single_request is True (default: False), then the behaviour when an empty byte_range is passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each time add() was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly.

Parameters:
  • url (str) – (str) The URL of the file to be streamed

  • client – (httpx.Client | None) The HTTPX client to use for HTTP requests

  • byte_range (Range | tuple[int, int]) – (Range | tuple[int,int]) The range of positions on the file to be requested

  • pruning_level (int) – (int) Either 0 (‘replant’), 1 (‘burn’), or 2 (‘strict’)

  • single_request (bool) – (bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.

  • force_async (bool) – (bool | None) Whether to require the client to be httpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)

  • chunk_size (int | None) – (int | None) The chunk size used for the httpx.Response.iter_raw response byte iterators

  • raise_response (bool) – (bool) Whether to raise HTTP status code exceptions

  • scan_contents (bool) – (bool) Whether to scan the archive contents upon initialisation and add the archive’s file ranges

add_file_ranges()[source]
check_central_dir_rec()[source]

Read the range corresponding to the Central Directory Record (after check_end_of_central_dir_rec() has been called).

check_end_of_central_dir_rec()[source]

Using the stored start position of the End Of Central Directory Record (or calculating and storing it if it is not yet set on the object),

check_end_of_central_dir_start()[source]

If the zip file lacks a comment, the End Of Central Directory Record will be the last thing in it, so taking the range equal to its expected size and checking for the expected start signature will find it.

check_head_bytes()[source]
decompress_zipped_file(zf_info, method=None, ext=None)[source]

Given a ZippedFileInfo object zf_info, and (optionally) its compression method [or else detecting that], decompress its bytes from the stream.

Parameters:
  • zf_info (ZippedFileInfo) – The compressed bytes

  • method (str | None) – Compression method (2-3 character abbreviated extension, lower case)

  • ext (str | None) – File extension to treat the bytes in the zf_info range as having (an option if zf_info is not being provided)

property filename_list: list[str]

Return only the file name list from the stored list of 2-tuples of (filename, extra bytes).

get_central_dir_bytes(step=20)[source]

Using the stored start position of the End Of Central Directory Record (or calculating and storing it if it is not yet set on the object), identify the files in the central directory record by searching backwards from the start of the End of Central Directory Record signature until finding the start of the Central Directory Record.

class range_streams.codecs.conda.CondaStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]

Bases: ZipStream

__init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]

Set up a stream for the conda (ZIP) archive at url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default: Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on the total_bytes property.

By default (if client is left as None) a fresh httpx.Client will be created for each stream.

The byte_range can be specified as either a Range object, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval [start, end), as given by Python’s built-in range.

The pruning_level controls the policy for overlap handling (0 will resize overlapped ranges, 1 will delete overlapped ranges, and 2 will raise an error when a new range is added which overlaps a pre-existing range).

Parameters:
  • url (str) – (str) The URL of the file to be streamed

  • client – (httpx.Client | None) The HTTPX client to use for HTTP requests

  • byte_range (Range | tuple[int, int]) – (Range | tuple[int,int]) The range of positions on the file to be requested

  • pruning_level (int) – (int) Either 0 (‘replant’), 1 (‘burn’), or 2 (‘strict’)

  • single_request (bool) – (bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.

  • force_async (bool) – (bool | None) Whether to require the client to be httpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)

  • chunk_size (int | None) – (int | None) The chunk size used for the httpx.Response.iter_raw response byte iterators

  • raise_response (bool) – (bool) Whether to raise HTTP status code exceptions

  • scan_contents (bool) – (bool) Whether to scan the archive contents upon initialisation and add the archive’s file ranges

validate_files()[source]

After zipped_files is set (as a list of ZippedFileInfo), validate that they meet the specification of the .conda file format. This means: 1 info-...tar.zst, 1 pkg-...tar.zst, and 1 metadata.json. The simplest way to uniquely identify them is to sort alphabetically by filename and check file prefixes/suffixes.

Return type:

None

class range_streams.codecs.tar.TarStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, scan_headers=True, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]

Bases: RangeStream

As for RangeStream, but if scan_headers is True, then immediately call check_header_recs() on initialisation (which will perform the necessary of range request to identify the files in the tar from the header record), setting tarred_files, and add() their file content ranges to the stream.

Setting this can be postponed until first access of the filename_list property (this will not add() them to the TarStream).

Once parsed, the file contents are stored as a list of TarredFileInfo objects (in the order they appear in the header record) in the tarred_files attribute. Each of these objects has a file_range() method which gives the range of its file content bytes within the TarStream.

__init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, scan_headers=True, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]

Set up a stream for the ZIP archive at url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default: Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on the total_bytes property.

By default (if client is left as None) a fresh httpx.Client will be created for each stream.

The byte_range can be specified as either a Range object, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval [start, end), as given by Python’s built-in range.

The pruning_level controls the policy for overlap handling (0 will resize overlapped ranges, 1 will delete overlapped ranges, and 2 will raise an error when a new range is added which overlaps a pre-existing range).

If single_request is True (default: False), then the behaviour when an empty byte_range is passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each time add() was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly.

Parameters:
  • url (str) – (str) The URL of the file to be streamed

  • client – (httpx.Client | None) The HTTPX client to use for HTTP requests

  • byte_range (Range | tuple[int, int]) – (Range | tuple[int,int]) The range of positions on the file to be requested

  • pruning_level (int) – (int) Either 0 (‘replant’), 1 (‘burn’), or 2 (‘strict’)

  • scan_headers (bool) – (bool) Whether to scan the archive headers upon initialisation and add the archive’s file ranges

  • single_request (bool) – (bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.

  • force_async (bool) – (bool | None) Whether to require the client to be httpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)

  • chunk_size (int | None) – (int | None) The chunk size used for the httpx.Response.iter_raw response byte iterators

  • raise_response (bool) – (bool) Whether to raise HTTP status code exceptions

add_file_ranges()[source]
check_header_recs()[source]

Scan through all header records in the file, building a list of TarredFileInfo objects describing the files described by the headers (but do not download those corresponding archived file ranges).

For efficiency, only look at the particular fields of interest, not the entire header each time.

property filename_list: list[str]

Return the names of files stored in tarred_files.

read_file_name(start_pos_offset=0)[source]

Return the file name by reading the file name for the header block starting at start_pos_offset (which for the first file will be 0, the default). Tar archives end with at least two empty blocks (i.e. 1024 bytes of padding), but there may be more than that. To catch this possibility, this method will raise a :class`StopIteration` error if the file name if NULL (i.e. if what was expected to be a file name is actually padding).

Return type:

str

read_file_size(start_pos_offset=0)[source]

Parse the file size field of the archived file whose header record begins at start_pos_offset.

Return type:

int

class range_streams.codecs.png.PngStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=True, force_async=False, chunk_size=None, raise_response=True, scan_ihdr=True, enumerate_chunks=True)[source]

Bases: RangeStream

As for RangeStream, but if scan_ihdr is True, then immediately call scan_ihdr() on initialisation (which will perform the necessary range request to read PNG metadata from its IHDR chunk), setting various attributes on the IHDR object.

Populating these attributes can be postponed [until manually calling scan_ihdr() and enumerate_chunks()] to avoid sending any range requests at initialisation.

__init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=True, force_async=False, chunk_size=None, raise_response=True, scan_ihdr=True, enumerate_chunks=True)[source]

Set up a stream for the PNG file at url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default: Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on the total_bytes property.

By default (if client is left as None) a fresh httpx.Client will be created for each stream.

The byte_range can be specified as either a Range object, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval [start, end), as given by Python’s built-in range.

The pruning_level controls the policy for overlap handling (0 will resize overlapped ranges, 1 will delete overlapped ranges, and 2 will raise an error when a new range is added which overlaps a pre-existing range).

If single_request is True (default: True), then the behaviour when an empty byte_range is passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each time add() was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly, and defaults to True in the PNG codec as chunks are read linearly.

Parameters:
  • url (str) – (str) The URL of the file to be streamed

  • client – (httpx.Client | None) The HTTPX client to use for HTTP requests

  • byte_range (Range | tuple[int, int]) – (Range | tuple[int,int]) The range of positions on the file to be requested

  • pruning_level (int) – (int) Either 0 (‘replant’), 1 (‘burn’), or 2 (‘strict’)

  • single_request (bool) – (bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.

  • force_async (bool) – (bool | None) Whether to require the client to be httpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)

  • scan_ihdr (bool) – (bool) Whether to scan the IHDR chunk on initialisation

  • enumerate_chunks (bool) – (bool) Whether to step through each chunk (read its metadata, and proceed until all chunks have been identified) upon initialisation

  • chunk_size (int | None) – (int | None) The chunk size used for the httpx.Response.iter_raw response byte iterators

  • raise_response (bool) – (bool) Whether to raise HTTP status code exceptions

property alpha_as_direct

To avoid distinguishing ‘direct’ image transparency (in IDAT) from ‘indirect’ (or computed, from tRNS) palette transparency, check for a colour map and then check for a tRNS chunk to determine overall whether this image has an alpha channel in whichever way.

any_semitransparent_idat(nonzero=True)[source]

Whether there are any non-255 values in the alpha channel of the PNG, determined from IDAT chunk alone. If not, the alpha channel serves no purpose in practice, and the image may be considered non-transparent.

If nonzero is True (the default), check for semitransparent, rather than nontransparent values (i.e. 0 < A < 255 rather than 0 <= A < 255).

Note: presumes alpha_as_direct() has already been called, so the image is known to have 4 channels.

Parameters:

nonzero (bool) – Whether to return True only if the image has ‘intermediate’ (between 0 and 255) values, otherwise whether they’re below 255.

property bit_depth_as_direct

Indexed images may report an IHDR bit depth other than 8, however the PLTE uses 8 bits per sample regardless of image bit depth, so override it to avoid distinguishing ‘direct’ bit depth from ‘indirect’ palette bit depth.

property channel_count_as_direct

If the image is indexed on a palette, then the channel count in the IHDR will be 1 even though the underlying sample contains 3 channels (R,G,B). To avoid distinguishing ‘direct’ image channels (in IDAT) from ‘indirect’ (or computed, from tRNS) palette channels, check for a colour map and then check for a tRNS chunk to determine overall whether this image has an extra channel for transparency.

property chunks

‘Gate’ to the internal _chunks attribute.

If this property is called before the internal attribute is set, (‘prematurely’), to avoid an access error it will ‘proactively’ call populate_chunks() before returning the gated internal attribute.

enumerate_chunks()[source]

Parse the length and type chunks, then skip past the chunk data and CRC chunk, so as to enumerate all chunks in the PNG (but request and read as little as possible). Build a dictionary of all chunks with keys of the chunk type (four letter strings) and values of lists (since some chunks e.g. IDAT can appear multiple times in the PNG).

See the official specification for full details (or Wikipedia, or the W3C).

Return type:

dict[str, list[PngChunkInfo]]

async enumerate_chunks_async()[source]

Parse the length and type chunks, then skip past the chunk data and CRC chunk, so as to enumerate all chunks in the PNG (but request and read as little as possible). Build a dictionary of all chunks with keys of the chunk type (four letter strings) and values of lists (since some chunks e.g. IDAT can appear multiple times in the PNG).

See the official specification for full details (or Wikipedia, or the W3C).

Return type:

dict[str, list[PngChunkInfo]]

get_chunk_data(chunk_info)[source]
Return type:

bytes

get_idat_data()[source]

Decompress the IDAT chunk(s) and concatenate, then confirm the length is exactly equal to height * (1 + width * bit_depth), and filter it (removing the filter byte at the start of each scanline) using reconstruct_idat().

Return type:

list[int]

has_chunk(chunk_type)[source]

Determine whether the given chunk type is one of the chunks defined in the PNG. If the chunks have not yet been parsed, they will first be enumerated.

Return type:

bool

populate_chunks()[source]

Call enumerate_chunks() and store in the internal _chunks attribute, accessible through the chunks property.

If the chunks property is called ‘prematurely’, to avoid an access error it will ‘proactively’ call this method before returning the gated internal attribute.

scan_ihdr()[source]

Request a range on the stream corresponding to the IHDR chunk, and populate the IHDR object (an instance of IHDRChunk from the range_streams.codecs.png.data module) according to the spec.

verify_async(msg='')[source]
verify_sync(msg='')[source]