API Reference
range_streams provides file-like object handling through
an API familiar to users of the standard library
io module. It uses Range, RangeSet,
and RangeDict classes (from the externally maintained
python-ranges library)
to represent and look up range operations in an efficient linked
list data structure.
Servers with support for HTTP range requests can provide partial content requests, avoiding the need to download and consume linearly from the start of a file when streaming, or without needing to download the entire file (non-streaming requests).
A RangeStream is initialised by providing:
a URL (the file to be streamed)
(optionally) a client (
httpx.Client), or else a fresh one is created(optionally) a range, as either:
Rangefrom the python-ranges package [recommended]; or a tuple of integers, presumed to be a half-open interval inclusive of start/exclusive of stop as is common practice in Python —[start, stop)in interval notation.
If no range (or the empty range) is given, a HTTP HEAD request will be
sent instead of a GET request, to check the total length of the file being streamed.
Either way therefore determines the total file length upon initialisation
(total_bytes, also available as the range spanning
the entire file total_range).
The following example shows the basic setup for a single range.
>>> from ranges import Range
>>> from range_streams import RangeStream, _EXAMPLE_URL
>>> s = RangeStream(url=_EXAMPLE_URL)
>>> rng = Range(0,3)
>>> s.add(rng)
>>> s.ranges
RangeDict{RangeSet{Range[0, 3)}: RangeResponse ⠶ [0, 3) @ 'example_text_file.txt' from raw.githubusercontent.com}
Once a request is made for a non-empty range, the RangeStream
acquires the first entry in the RangeDict stored on the
ranges attribute. This gates access
to the internal _ranges attribute RangeDict), which takes
into account whether the bytes in each range’s
RangeResponse are exhausted
or removed due to overlap with another range. See the docs for further details.
Further ranges are requested by simply calling the add()
method with another Range object. To create this implicitly, you can
simply provide a byte range to the add method as a tuple of two integers,
which will be interpreted per the usual convention for ranges in Python,
as an [a,b) half-open interval.
>>> s.add(byte_range=(7,9))
>>> s.ranges
RangeDict{
RangeSet{Range[0, 3)}: RangeResponse ⠶ [0, 3) @ 'example_text_file.txt' from raw.githubusercontent.com,
RangeSet{Range[7, 9)}: RangeResponse ⠶ [7, 9) @ 'example_text_file.txt' from raw.githubusercontent.com
}
Codecs are available for .zip (ZipStream) and .conda
(CondaStream) archives, which will read and
name the ranges corresponding to the archive’s contents file list upon initialisation.
>>> from range_streams import _EXAMPLE_ZIP_URL
>>> from range_streams.codecs import ZipStream
>>> s = ZipStream(url=_EXAMPLE_ZIP_URL)
>>> s.ranges
RangeDict{
RangeSet{Range[51, 62)}: RangeResponse ⠶ "example_text_file.txt" [51, 62) @ 'example_text_file.txt.zip' from raw.githubusercontent.com
}
The .conda format is just a particular type of zip for Python packages on the conda
package manager (containing JSON and Zstandard-compressed tarballs):
>>> from range_streams.codecs import CondaStream
>>> EXAMPLE_CONDA_URL = "https://repo.anaconda.com/pkgs/main/linux-64/progressbar2-3.34.3-py27h93d0879_0.conda"
>>> s = CondaStream(url=EXAMPLE_CONDA_URL)
>>> s.ranges
RangeDict{
RangeSet{Range[77, 6427)}: RangeResponse ⠶ "info-progressbar2-3.34.3-py27h93d0879_0.tar.zst" [77, 6427) @ 'progressbar2-3.34.3-py27h93d0879_0.conda' from repo.anaconda.com
RangeSet{Range[6503, 39968)}: RangeResponse ⠶ "pkg-progressbar2-3.34.3-py27h93d0879_0.tar.zst" [6503, 39968) @ 'progressbar2-3.34.3-py27h93d0879_0.conda' from repo.anaconda.com
RangeSet{Range[40011, 40042)}: RangeResponse ⠶ "metadata.json" [40011, 40042) @ 'progressbar2-3.34.3-py27h93d0879_0.conda' from repo.anaconda.com
}
Note that unlike zips, tarballs use solid compression meaning they are not amenable to range request (you could but there’d be no benefit, to my understanding).
A further codec handles PNG images (a file format composed of ‘chunks’ of different types). The metadata can be identified from looking in the IHDR chunk and checking for the presence of other chunks. Some properties are made available ‘as direct’ (i.e. reliably, regardless of the specific PNG compression) mimicking the approach of the PyPNG library.
>>> from range_streams import _EXAMPLE_PNG_URL
>>> from range_streams.codecs import PngStream
>>> s = PngStream(url=_EXAMPLE_PNG_URL)
>>> s.alpha_as_direct
True
>>> s.channel_count_as_direct
4
>>> s.chunks
{'IHDR': [PngChunkInfo :: {'data_range': Range[16, 29), 'end': 33, 'length': 13, 'start': 8, 'type': 'IHDR'}],
'zTXt': [PngChunkInfo :: {'data_range': Range[41, 1887), 'end': 1891, 'length': 1846, 'start': 33, 'type': 'zTXt'}],
'iCCP': [PngChunkInfo :: {'data_range': Range[1899, 2287), 'end': 2291, 'length': 388, 'start': 1891, 'type': 'iCCP'}],
'bKGD': [PngChunkInfo :: {'data_range': Range[2299, 2305), 'end': 2309, 'length': 6, 'start': 2291, 'type': 'bKGD'}],
'pHYs': [PngChunkInfo :: {'data_range': Range[2317, 2326), 'end': 2330, 'length': 9, 'start': 2309, 'type': 'pHYs'}],
'tIME': [PngChunkInfo :: {'data_range': Range[2338, 2345), 'end': 2349, 'length': 7, 'start': 2330, 'type': 'tIME'}],
'tEXt': [PngChunkInfo :: {'data_range': Range[2357, 2382), 'end': 2386, 'length': 25, 'start': 2349, 'type': 'tEXt'}],
'IDAT': [PngChunkInfo :: {'data_range': Range[2394, 5108), 'end': 5112, 'length': 2714, 'start': 2386, 'type': 'IDAT'}],
'IEND': [PngChunkInfo :: {'data_range': Range[5120, 5120), 'end': 5124, 'length': 0, 'start': 5112, 'type': 'IEND'}]}
>>> s.data.IHDR
IHDRChunk :: {'bit_depth': 8, 'channel_count': 4, 'colour_type': 6, 'compression': 0, 'end_pos': 29, 'filter_method': 0, 'height': 100, 'interlacing': 0, 'start_pos': 16, 'struct': '>IIBBBBB', 'width': 100}
>>> s.get_idat_data()[:4]
[153, 0, 0, 255]
Range streams
This class represents a file being streamed as a sequence of non-overlapping ranges.
range_streams.stream exposes a class
RangeStream, whose key property (once initialised) is
ranges,
which provides a RangeDict comprising the ranges of
the file being streamed.
The method add() will request further ranges,
and (unlike the other methods in this module) will accept a tuple of two integers as its
argument (byte_range).
- class range_streams.stream.RangeStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]
Bases:
objectA class representing a file being streamed from a server which supports range requests, with the ranges property providing a list of those intervals requested so far (and not yet exhausted).
When the class is initialised its length checked upon the first range request, and the client provided is not closed (you must handle this yourself). Further ranges may be requested on the
RangeStreamby callingadd().Both the
__init__()andadd()methods support the specification of a range interval as either a tuple of two integers or aRangefrom thepython-rangespackage (an external requirement installed alongside this package). Either way, the interval created is interpreted to be the standard Python convention of a half-open interval[start,stop).Don’t forget to close the
httpx.Responseyourself! Theclose()method is available (orclose()) to help you.- __init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]
Set up a stream for the file at
url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default:Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on thetotal_bytesproperty.By default (if
clientis left asNone) a freshhttpx.Clientwill be created for each stream.The
byte_rangecan be specified as either aRangeobject, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval[start, end), as given by Python’s built-inrange.If
byte_rangeis passed as the empty rangeRange(0,0)(its default), then a HEAD request is sent tourlon initialisation, setting thetotal_bytesvalue from thecontent-lengthheader in the subsequent response.If
single_requestisTrue(default:False), then the behaviour when an emptybyte_rangeis passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each timeadd()was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly.Note: internally, this single request is known as ‘the monostream’, and is stored on the
monostreamproperty.Note: a single request will not be as efficient if streaming the response non-linearly (since reading a byte in the stream requires loading all bytes up to it). This will mean it is only performant to use for certain file types or applications (e.g. a ZIP file is read “in a principled manner” from the end [the Central Directory] first, so gains greatly from using multiple partial content requests rather than a single stream, whereas a PNG file can only be read “in a principled manner” linearly, iterating through the chunks from the start).
The
pruning_levelcontrols the policy for overlap handling (0will resize overlapped ranges,1will delete overlapped ranges, and2will raise an error when a new range is added which overlaps a pre-existing range).The
chunk_sizecontrols the size of the chunks that are read in from thehttpx.Response.iter_rawiterator on the streamed HTTP response.See docs for the
handle_overlap()method for further details.
- Parameters:
client – (
httpx.Client|None) The HTTPX client to use for HTTP requestsbyte_range (Range | tuple[int, int]) – (
Range|tuple[int,int]) The range of positions on the file to be requestedpruning_level (int) – (
int) Either0(‘replant’),1(‘burn’), or2(‘strict’)single_request (bool) – (
bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.force_async (bool) – (
bool|None) Whether to require the client to behttpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)chunk_size (int | None) – (
int|None) The chunk size used for thehttpx.Response.iter_rawresponse byte iteratorsraise_response (bool) – (
bool) Whether to raise HTTP status code exceptions
- _active_range: Range | None = None
Set by
set_active_range(), through which theactive_range_responseproperty gives access to the currently ‘active’ range (usually the most recently created).
- _ranges: RangeDict
‘Internal’ ranges attribute. Start position is not affected by reading in bytes from the
RangeResponse(unlike the ‘external’rangesproperty)
- async aclose()[source]
Close any
httpx.Responseon the async stream. In single request mode, there is just the one (shared with all the ‘windowed’ responses).- Return type:
- property active_range_response: RangeResponse
Look up the
RangeResponseobject associated with the currently active range by using_active_rangeas theRangekey for the internal_rangesRangeDict.Look it up in the
_rangesRangeDictinstead if in single request mode.
- add(byte_range=Range[0, 0), activate=True, name='')[source]
Add a range to the stream. If it is empty and the length of the stream has not already been determined, this will initiate a HEAD request to check the file’s total size. In all other cases, only add the
Rangeto theRangeDictofranges, set up a streaming partial content GET request, but do not try to read any bytes from it (so response data will be downloaded upon creation).The
byte_rangecan be specified as either aRangeobject, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval[start, end), as given by Python’s built-inrange.If
activateisTrue, make this range the active range upon adding it to the stream (allowing access to the associated response through theactive_range_responseproperty).If a
nameis provided (used in subclasses where the stream is an archive with individually named files within it), assign this name to theRangeResponse(as itsrange_nameargument).- Parameters:
byte_range (Range | tuple[int, int]) – (
Range|tuple[int,int]) The range of positions on the file to be requested and stored in theRangeDictonrangesactivate (bool) – (
bool) Whether to make this newly addedRangethe active range on the stream upon creating it.name (str) – (
str) A name (default:'') to give to the range.
- Return type:
None
- add_window(byte_range=Range[0, 0), activate=True, name='')[source]
Register a window onto the original range in the
_rangesRangeDict rather than add a new range entry to the dict (which would A) clash with the single entire range B) require another request- Parameters:
byte_range (Range | tuple[int, int]) – (
Range|tuple[int,int]) The range of positions on the file to be read from the request on the stream.activate (bool) – (
bool) Whether to make this newly addedRangethe active range on the stream upon creating it.name (str) – (
str) A name (default:'') to give to the range.
- Return type:
None
- burn_range(overlapped_ext_rng)[source]
Get the internal range (i.e. without offsets applied from the current read position on the range) from the external one (which may differ if the seek position has advanced from the start position, usually due to reading bytes from the range). Once this internal range has been identified, delete it, and set the
_active_rangeto the most recent (or if the stream becomes empty, set it toNone).- Parameters:
overlapped_ext_rng (
Range) – the overlapped external range
- check_range_integrity(use_windows=False)[source]
Every
RangeSetin the_rangesRangeDictkeys must contain 1Rangeeach- Return type:
- check_response_length(headers, req)[source]
Return the length of the response from its
content-lengthheader (after checking it contains this header, else raisingKeyError), as an integer.
- property client_is_async
- close()[source]
Close any
httpx.Responseon the stream. In single request mode, there is just the one (shared with all the ‘windowed’ responses).- Return type:
- compute_external_ranges(use_windows=False)[source]
If
use_windowsisTrue, theinternal_range_dictis_range_windowsrather than_rangeswhenuse_windowsisFalse(default:False).Modifying the
internal_range_dictattribute to account for the bytes consumed (from the head) and tail mark offset of where a range was already trimmed to avoid an overlap (from the tail).While the
RangeSetkeys are a deep copy of theinternal_range_dictRangeDictkeys (and therefore will not propagate if modified), the RangeResponse values are references, therefore will propagate to theinternal_range_dictRangeDictif modified (primarily whenread).When
use_windowsisTrue, these RangeResponse values are ‘simulations’ (a.k.a. mock/dummy objects) of the range response that would be received from a partial content request (they in fact merely came from a streamed GET request).- Return type:
- ext2int(ext_rng)[source]
Given the external range ext_rng and the
RangeStreamon which it is ‘stored’ (or rather, computed, in therangesproperty), return the internalRangestored on the_rangesattribute of theRangeStream, by looking up the sharedRangeResponsevalue.
- property freely_requestable
Trivial opposite of the
single_requestattribute, so that conditional blocks can treat this as the ‘conventional’ case and the single request case be the alternative (which looks better).
- async get_async_monostream()[source]
Send a streaming GET request with an open-ended
content-rangeheader, to obtain the total range. Suitable for higher performance (to avoid repeated requests on theRangeStreamwhich accrue a time cost).Should be called after the
RangeStreamis initialised (with bothsingle_requestandforce_asyncas True), and [unlike the initialisation method] of course this method must be awaited.- Return type:
- get_monostream()[source]
Send a streaming GET request with an open-ended
content-rangeheader, to obtain the total range. Suitable for higher performance (to avoid repeated requests on theRangeStreamwhich accrue a time cost).Called at initialisation (within the first) when
single_requestis passed toRangeStreamasTrue.- Return type:
- handle_overlap(rng, internal=False, use_windows=False)[source]
Handle overlaps with a given pruning level: :rtype:
None“replant” ranges overlapped at the head with fresh, disjoint ranges ‘downstream’ or mark their tails to effectively truncate them if overlapped at the tail
“burn” existing ranges overlapped anywhere by the new range
“strict” will throw a
ValueError
- property is_closed
True if the
httpx.Responseobject(s) associated with theRangeResponsevalues in the internal_rangesRangeDictis/are all closed.
- isempty()[source]
Whether the internal
_rangesRangeDictis empty (contains no range-RangeResponse key-value pairs).- Return type:
- list_ranges()[source]
Retrieve ascending order list of RangeSet keys, as a
listofRange.The
RangeSettoRangetransformation is permitted because therangesproperty method begins by checking range integrity, which requires eachRangeSetto be a singleton set (of a singleRange).If
activateisTrue(the default), the range will be made the active range of theRangeStreamupon being registered (if it meets the criteria for registration).If
pruning_levelis0then overlaps are handled using a “replant” policy (redefine and overwrite the existing range to be disjoint when the new range would overlap it), if it’s1they are handled with a “burn” policy (simply dispose of the existing range to eliminate any potential overlap), and if it’s2using a “strict” policy (raising errors upon detecting overlap).
- classmethod make_async_fetcher(urls, callback=None, verbose=False, show_progress_bar=True, timeout_s=5.0, client=None, close_client=False, **kwargs)[source]
- property ranges
Read-only view on the
RangeDictstored in the_rangesattribute, modifying it to account for the bytes consumed (from the head) and tail mark offset of where a range was already trimmed to avoid an overlap (from the tail).Each
rangesRangeDictkey is aRangeSetcontaining 1Range. Check this assumption (singletonRangeSet“integrity”) holds and retrieve this list ofRangeSetkeys in ascending order, as a list ofRange.Requests are restricted to not re-request already-requested file ranges, so give windows onto the underlying range that can be consumed (but the underlying :class:~range_streams.response.RangeResponse` will persist and cannot be consumed by reading).
- send_head_request()[source]
Send a ‘plain’ HEAD request without range headers, to check the total content length without creating a RangeRequest (simply discard the response as it can only be associated with the empty range, which cannot be stored in a
RangeDict), raising for status ASAP. To be used when initialised with an empty byte range. If therange_streams.stream.RangeStream.clientis asynchronous, use a synchronous client (created for this single request).- Return type:
- set_active_range(rng)[source]
Setter for the active range (through which
active_range_responseis also set).
- set_client(client, force_async)[source]
Check client type explicitly to handle a/sync and optional HTTPX client.
- Parameters:
client – (
httpx.Client| class:httpx.AsyncClient |None) The client to be used for all HTTP requests made on the range_streams.stream.RangeStream. IfNone, a fresh one will be created.force_async (
bool) – (bool) If theclientisNone, this parameter determines whetherhttpx.Clientor class:httpx.AsyncClient is set as the client. If a synchronous client is given andforce_asyncisTrue, an error will be raised.
- Return type:
- simulate_request(byte_range, parent_range_request=None)[source]
Simulate the
RangeRequestobtained from a partial content request forbyte_rangeon the stream’s URL through a “window” onrange_request(expected to be a streamed GET request for the full file range).If no
parent_range_requestis provided, it is assumed to be the one on theRangeResponsein the internal_rangesRangeDict- Parameters:
byte_range (Range) – The
Rangeto simulate a partial content request for.parent_range_request (RangeRequest | None) – The
RangeRequestover which to use a “window” to simulate the range request.
- Return type:
- property sync_client
either the stream’s client, or a fresh one if the stream’s client is asynchronous. Used for HEAD requests on an async RangeStream. Presumes a client has been set correctly.
- Type:
Provide a synchronous client
HTTP request helper functions
These helper functions help prepare HTTP requests to set up a stream.
When preparing a HTTP GET request, the HTTP range request
header must be provided as a dict, for example:
{"range": "bytes=0-1"}
would request the two bytes at positions 0 and 1 (i.e. the inclusive
interval [0,1]).
An empty range can also be specified with the value bytes="-0", which is useful to
determine the total length of a file (as the Content-Range header
returned by the server contains the total size of the file from which the range was taken).
- exception range_streams.http_utils.PartialContentStatusError(*, request, response)[source]
Bases:
ExceptionThe response had any HTTP status code other than 206 (Partial Content).
May be raised when calling
raise_for_non_partial_content()
- range_streams.http_utils.byte_range_from_range_obj(rng)[source]
Prepare the byte range substring for a HTTP range request.
For example:
>>> from range_streams.http_utils import byte_range_from_range_obj >>> byte_range_from_range_obj(Range(0,2)) '0-1'
- Parameters:
rng (
Range) – range of the bytes to be requested (0-based)- Return type:
- Returns:
A hyphen-separated string of start and end positions. The start position is missing if the range provided is empty, and this corresponds to a request for “the last zero bytes” i.e. an empty range request.
- range_streams.http_utils.detect_header_value(headers, key, source='Response')[source]
Detect a title case, lower case, or capitalised version of the given string.
- range_streams.http_utils.range_header(rng)[source]
Prepare a
dictto pass as ahttpxrequest header with a single keyrangeswhose value is the byte range.For example:
>>> from range_streams.http_utils import range_header >>> range_header(Range(0,2)) {'range': 'bytes=0-1'}
>>> range_header(Range(0,0)) {'range': 'bytes=0-'}
- Parameters:
rng (
Range) – range of the bytes to be requested (0-based)- Return type:
- Returns:
dictsuitable to be passed tohttpx.Client.build_requestinsetup_stream()throughrange_header
Asynchronous fetcher
This helper class handles all of the details of asynchronously fetching streams, given a list of URLs.
- class range_streams.async_utils.AsyncFetcher(stream_cls, urls, callback=None, verbose=False, show_progress_bar=True, timeout_s=5.0, client=None, close_client=False, **kwargs)[source]
Bases:
object- async async_fetch_urlset(urls)[source]
If the
clientisNone, create one in a contextmanager block (i.e. close it immediately after use), otherwise use the one provided, not in a contextmanager block (i.e. leave it up to the user to close the client).
- complete_row(row_index)[source]
Add the range corresponding to the range at row
row_indexto thecompletedRangeSet, meaning it will be omitted on any further call tomake_calls(). This should be done to indicate the URL at that row has been processed (either successfully or unsuccessfully, e.g. it gave a 404).- Return type:
- async fetch(client, url)[source]
- Parameters:
client –
httpx.AsyncClienturl –
httpx.URL
- Return type:
TypeVar(_T, bound= range_streams.stream.RangeStream)
- make_calls()[source]
The method called to run the event loop to fetch URLs, after initialisation and/or repeatedly upon exitting the loop (i.e. it can recover from errors).
- mark_url_complete(url)[source]
Add the row index for the given URL in the
url_listto thecompletedRangeSet, meaning it will be omitted on any further call tomake_calls(). This should be done to indicate the URL has been processed (either successfully or unsuccessfully, e.g. it gave a 404).- Return type:
- exception range_streams.async_utils.SignalHaltError(signal_enum)[source]
Bases:
SystemExit
Overlap handling
These helper functions report on/handle the various possible ways ranges can overlap, and the actions taken if an overlap is found.
- range_streams.overlaps.get_range_containing(rng_dict, position)[source]
Get a
Rangefromrng_dictby looking up thepositionit contains, whererng_dictis either the internalRangeStream._rangesattribute or the externalrangesproperty.Presumes range integrity has been checked.
Raises
ValueErrorifpositionis not inrng_dict.
- range_streams.overlaps.overlap_whence(rng_dict, rng)[source]
Determine if any overlap exists, whence (i.e. from where) on the pre-existing range it overlapped.
0if the new range overlapped at the start (‘head’) of the existing range,1if fully contained (in the ‘body’),2if at the end (‘tail’), orNoneif the range is non-overlapping with any pre-existing range.Note: same convention as Python io module’s
SEEK_SET,SEEK_CUR, andSEEK_END.- Return type:
int | None
Requests and responses
These classes facilitate the streaming of data from a URL, and handling the response as a file-like object.
- class range_streams.request.RangeRequest(byte_range, url, client, GET_got=None, window_on_range=Range[0, 0), chunk_size=None)[source]
Bases:
objectStore a GET request and the response stream while keeping a reference to the client that spawned it, providing an overridable
_iteratorattribute [by default giving access toiter_raw()] on the underlyinghttpx.Response, suitable forRangeResponseto wrap in aio.BytesIObuffered stream. For async clients,_aiteratoris set instead [giving access toaiter_raw()] on the- async aiter_raw()[source]
Wrap the
iter_raw()method of the underlyinghttpx.Responseobject within theRangeResponseinresponse.- Return type:
- property aiterator_initialised
- async await_aiterator()[source]
Initialise the async iterator on the
_aiteratorattribute from the stored function which when called returns thetyping.AsyncIterator[bytes].- Return type:
- check_client()[source]
Type checking workaround (Sphinx type hint extension does not like httpx so check the type manually with a method called at initialisation).
- property client_is_async
- close()[source]
Close the
responseRangeResponse.- Return type:
- content_range_header()[source]
Validate request was range request by presence of
content-rangeheader- Return type:
- classmethod from_get_stream(byte_range, client, req, resp, chunk_size=None)[source]
Avoid making a new partial content request, instead interpret a streaming GET request as one when provided along with a
byte_range.Does not call
raise_for_non_partial_content()as is done after setting therequestandresponseinsetup_stream().Note:
reqandrespare type checked ‘manually’ at init (not via type hints) due to Sphinx type hints bug with thehttpxlibrary.- Parameters:
byte_range (Range) – The
Rangeprovided by this request.req – The sent
httpx.Requestresp – The received
httpx.Responsechunk_size (int | None) – The size of chunks to read the response into the buffer with
- Return type:
- iter_raw()[source]
Wrap the
iter_raw()method of the underlyinghttpx.Responseobject within theRangeResponseinresponse.
- raise_for_non_partial_content()[source]
Raise the
PartialContentStatusErrorif the response status code is anything other than 206 (Partial Content), as that is what was requested.
- property range_header
- setup_stream()[source]
client.stream("GET", url)but leave the stream to be manually closed rather than using a context manager- Return type:
- property total_content_length: int
Obtain the total content length from the
content-rangeheader of a partial content HTTP GET request. This method is not used for the HTTP HEAD request sent when aRangeStreamis initialised with an emptyRange(since that is not a partial content request it returns acontent-lengthheader which can be read as an integer directly).
- classmethod windowed_request(byte_range, range_request, tail_mark, chunk_size)[source]
Reuse the stream from an existing streaming request rather to create a new ‘windowed’ RangeRequest from an existing RangeRequest, but change the byte range to be used on it. If the existing RangeRequest (
range_request) is anything other than a stream of the full file range, then relative ranges will need to be calculated. This constructor was written on the assumption of a full file range.- Parameters:
byte_range (Range) – The
Rangeprovided by this request.on_request – The sent
httpx.Requesttail_mark (int) – The
tail_markto trim thebyte_range(if any). Passed separatelychunk_size (int | None) – The chunk size to the
httpx.Response.iter_rawiterator (orhttpx.Response.aiter_rawif using an async client)
- Return type:
- class range_streams.response.RangeResponse(stream, range_request, range_name='')[source]
Bases:
objectAdapted from obskyr’s ResponseStream demo code, this class handles the streamed partial request as a file-like object.
Don’t forget to close the
httpx.Responseyourself! Theclose()method is available (orclose()) to help you.- async aclose()[source]
Close the associated
httpx.Responseobject. In single request mode, there is just the one (shared with all the ‘windowed’ responses).
- async aread(size=None)[source]
File-like reading within the range request stream, with careful handling of windowed ranges and tail marks.
- Return type:
- buf_keep()[source]
If the currently set active buffer range on the
_bytesbuffer is not the range on thisRangeResponse, then set it to be.This is the mechanism by which windowed ranges are switched (the windows share the same ‘source’ buffer, and the value of the active buffer range stored on that buffer indicates the most recently active window).
At initialisation, all
RangeResponsehave their active buffer range set to the empty range,Range(0,0).- Return type:
- check_is_windowed()[source]
Whether the associated request is windowed. Used to set
is_windowedon init- Return type:
- property client
The request’s client.
- close()[source]
Close the associated
httpx.Responseobject. In single request mode, there is just the one (shared with all the ‘windowed’ responses).
- property is_active_buf_range: bool
The active range is stored on the buffer the HTTP response stream writes to (in the
active_buf_rangeattribute) so that whenever the active range changes, it is detectable immediately (all interfaces to read/seek/load the buffer are ‘guarded’ by a call tobuf_keep()to achieve this).When this change is detected, since the cursor may be in another range of the shared source buffer (where the previously active window was busy doing its thing), the cursor is first moved to the last stored
tell()position, which is stored on eachRangeResponsein thetoldattribute, and initialised as0so that on first use it simply refers to the start position of the window range.Note that the active range only changes for ‘windowed’
RangeResponseobjects sharing a ‘source’ buffer with a source_rangesRangeDict. To clarify: the active range changes on first use for non-windowed ranges, since the active range is initialised as the empty range (but after that it doesn’t!)
- property is_closed
True if the associated
httpx.Responseobject is closed. For a windowed response in single request mode, this will be shared with any/all other windowed responses on the stream.
- is_consumed()[source]
Whether the
tell()position (indicating ‘consumed’ or ‘read so far’) along with thetail_markindicates whether the stream should be considered consumed.The
tail_markis part of a mechanism to ‘shorten’ ranges when an overlap is detected, to preserve the one-to-one integrity of theRangeDict(see notes on the “replant” policy ofhandle_overlap(), set by thepruning_levelpassed intoRangeStreamon initialisation).Note that there is (absolutely!) nothing stopping a stream from being re-consumed, but this library works on the assumption that all streams will be handled in an efficient manner (with any data read out from them either used once only or else will be reused from the first output rather than re-accessed directly from the stream itself).
To this end,
RangeStreamhas measures in place to “decommission” ranges once they are consumed (see in particularburn_range()andhandle_overlap()).- Return type:
- property is_in_window: bool
Whether file cursor is in the window. Trivially true for a non-windowed request, otherwise checks if the file cursor is currently within (or exactly at the end of) the window range.
- property name: str
A wrapper to access the
nameof the ‘parent’RangeStream.
- prepare_reading_window()[source]
Prepare the stream cursor for reading (unclear if this should only be done on initialisation…) Should be done every time if the cursor is shared, but is it?
- Return type:
- read(size=None)[source]
File-like reading within the range request stream, with careful handling of windowed ranges and tail marks.
- Return type:
- seek(position, whence=0)[source]
File-like seeking within the range request stream. Synchronous only.
- set_active_buf_range(rng)[source]
Update the
_bytesbuffer’sactive_buf_rangeattribute with the given :~ranges.Range` (rng).- Return type:
- property source_aiterator
The async iterator associated with the source range, for a windowed range.
- property source_iterator
The iterator associated with the source range, for a windowed range.
- property source_range: Range
Wrapper for
window_on_rangewith a less confusing name to access. Note that this will be the empty range if the request is not a windowed request.
- property source_range_response: RangeResponse
The RangeResponse associated with the source range, for a windowed range. Only access this if windowed (if not a windowed range, this will give the RangeResponse associated with the range at position 0, as the default
window_on_rangevalue for non-windowed ranges is the empty range[0,0), whose start will be used as the key for the_rangesRangeDict).
- store_tell()[source]
Store the [window-relative] tell value in
toldin the event of any read, seek, or load on the stream, when accessed through the RangeResponse (do not access directly if you want to keep a reliable stored value fortold).- Return type:
-
tail_mark:
int= 0 The amount by which to shorten the ‘tail’ (i.e. the upper end) of the range when deciding if it is ‘consumed’. Incremented within the
handle_overlap()method when thepruning_levelis set to1(indicating a “replant” policy).Under a ‘replant’ policy, when a new range is to be added and would overlap at the tail of an existing range, the pre-existing range should be effectively truncated by ‘marking their tails’ (where an existing range is assumed here to only be considered a range if it is not ‘consumed’ yet).
- tell_abs(live=True)[source]
Get the absolute file cursor position from either the active range response tell (if
liveisTrue: default) or the position stored on the active range response (ifliveisFalse).Both are given as
absolutepositions by adding thewindow_offset, (which is 0 for non-windowed ranges).- Return type:
- property total_len_to_read
- property url: str
A wrapper to access the
urlof the ‘parent’RangeStream.
Range operations
These tools perform transformations on, or output particular information from, the data structures which store ranges.
- range_streams.range_utils.most_recent_range(stream, internal=True)[source]
For all of the
RangeResponsevalues in theRangeDict, list the ranges from their originalrequestin order of registration.If
internalisTrue, use_rangesas theRangeDict, else use the ‘external’ (computed) propertyranges. The external ones take into account the position the file has been read/seeked to.- Parameters:
stream (range_streams.stream.RangeStream) – Either the internal or external ranges of a
RangeStream.internal (bool) – Whether to use the internal or external ranges.
- Return type:
Range | None
- range_streams.range_utils.range_span(ranges)[source]
Given a list of
Range, calculate their ‘span’ (i.e. the range spanned from their minimum to maximum). This span may of course not be completely ‘covered’ by the ranges in the list.Assumes input list of
RangeSetare in ascending order, switches if not.
- range_streams.range_utils.range_termini(rng)[source]
Get the inclusive start and end positions
[start,end]from aranges.Range. These are referred to as the ‘termini’. Ranges are always ascending.
- range_streams.range_utils.ranges_in_reg_order(ranges)[source]
Given a
RangeDict, list the ranges in order of registration.Presumes integrity is already checked.
- Parameters:
ranges (
RangeDict) – Either the internal or external ranges of aRangeStream.- Return type:
- range_streams.range_utils.response_ranges_in_reg_order(ranges)[source]
For all of the :class:~range_streams.response.RangeResponse` values in the
RangeDict, list the ranges from their original :attribute:~range_streams.response.RangeResponse.request` in order of registration.- Parameters:
ranges (
RangeDict) – Either the internal or external ranges of aRangeStream.- Return type:
Streaming codecs
Codecs for PNG, ZIP, and .conda, and TAR formats to assist in handling these file types in regard to the information in header sections defined in their specifications.
- class range_streams.codecs.zip.ZipStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]
Bases:
RangeStreamAs for
RangeStream, but ifscan_contentsis True, then immediately callcheck_central_dir_rec()on initialisation (which will perform a series of range requests to identify the files in the zip from the End of Central Directory Record and Central Directory Record), settingzipped_files, andadd()their file content ranges to the stream.Setting this can be postponed until first access of the
filename_listproperty (this will notadd()them to theZipStream).Once parsed, the file contents are stored as a list of
ZippedFileInfoobjects (in the order they appear in the Central Directory Record) in thezipped_filesattribute. Each of these objects has afile_range()method which gives the range of its file content bytes within theZipStream.- __init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]
Set up a stream for the ZIP archive at
url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default:Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on thetotal_bytesproperty.By default (if
clientis left asNone) a freshhttpx.Clientwill be created for each stream.The
byte_rangecan be specified as either aRangeobject, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval[start, end), as given by Python’s built-inrange.The
pruning_levelcontrols the policy for overlap handling (0will resize overlapped ranges,1will delete overlapped ranges, and2will raise an error when a new range is added which overlaps a pre-existing range).If
single_requestisTrue(default:False), then the behaviour when an emptybyte_rangeis passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each timeadd()was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly.See docs for the
handle_overlap()method for further details.
- Parameters:
client – (
httpx.Client|None) The HTTPX client to use for HTTP requestsbyte_range (Range | tuple[int, int]) – (
Range|tuple[int,int]) The range of positions on the file to be requestedpruning_level (int) – (
int) Either0(‘replant’),1(‘burn’), or2(‘strict’)single_request (bool) – (
bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.force_async (bool) – (
bool|None) Whether to require the client to behttpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)chunk_size (int | None) – (
int|None) The chunk size used for thehttpx.Response.iter_rawresponse byte iteratorsraise_response (bool) – (
bool) Whether to raise HTTP status code exceptionsscan_contents (bool) – (
bool) Whether to scan the archive contents upon initialisation and add the archive’s file ranges
- check_central_dir_rec()[source]
Read the range corresponding to the Central Directory Record (after
check_end_of_central_dir_rec()has been called).
- check_end_of_central_dir_rec()[source]
Using the stored start position of the End Of Central Directory Record (or calculating and storing it if it is not yet set on the object),
- check_end_of_central_dir_start()[source]
If the zip file lacks a comment, the End Of Central Directory Record will be the last thing in it, so taking the range equal to its expected size and checking for the expected start signature will find it.
- decompress_zipped_file(zf_info, method=None, ext=None)[source]
Given a
ZippedFileInfoobjectzf_info, and (optionally) its compression method [or else detecting that], decompress its bytes from the stream.
- property filename_list: list[str]
Return only the file name list from the stored list of 2-tuples of (filename, extra bytes).
- get_central_dir_bytes(step=20)[source]
Using the stored start position of the End Of Central Directory Record (or calculating and storing it if it is not yet set on the object), identify the files in the central directory record by searching backwards from the start of the End of Central Directory Record signature until finding the start of the Central Directory Record.
- class range_streams.codecs.conda.CondaStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]
Bases:
ZipStream- __init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=False, force_async=False, chunk_size=None, raise_response=True, scan_contents=True)[source]
Set up a stream for the conda (ZIP) archive at
url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default:Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on thetotal_bytesproperty.By default (if
clientis left asNone) a freshhttpx.Clientwill be created for each stream.The
byte_rangecan be specified as either aRangeobject, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval[start, end), as given by Python’s built-inrange.The
pruning_levelcontrols the policy for overlap handling (0will resize overlapped ranges,1will delete overlapped ranges, and2will raise an error when a new range is added which overlaps a pre-existing range).See docs for the
handle_overlap()method for further details.
- Parameters:
client – (
httpx.Client|None) The HTTPX client to use for HTTP requestsbyte_range (Range | tuple[int, int]) – (
Range|tuple[int,int]) The range of positions on the file to be requestedpruning_level (int) – (
int) Either0(‘replant’),1(‘burn’), or2(‘strict’)single_request (bool) – (
bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.force_async (bool) – (
bool|None) Whether to require the client to behttpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)chunk_size (int | None) – (
int|None) The chunk size used for thehttpx.Response.iter_rawresponse byte iteratorsraise_response (bool) – (
bool) Whether to raise HTTP status code exceptionsscan_contents (bool) – (
bool) Whether to scan the archive contents upon initialisation and add the archive’s file ranges
- validate_files()[source]
After
zipped_filesis set (as a list ofZippedFileInfo), validate that they meet the specification of the.condafile format. This means: 1info-...tar.zst, 1pkg-...tar.zst, and 1metadata.json. The simplest way to uniquely identify them is to sort alphabetically by filename and check file prefixes/suffixes.- Return type:
- class range_streams.codecs.tar.TarStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, scan_headers=True, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]
Bases:
RangeStreamAs for
RangeStream, but ifscan_headersisTrue, then immediately callcheck_header_recs()on initialisation (which will perform the necessary of range request to identify the files in the tar from the header record), settingtarred_files, andadd()their file content ranges to the stream.Setting this can be postponed until first access of the
filename_listproperty (this will notadd()them to theTarStream).Once parsed, the file contents are stored as a list of
TarredFileInfoobjects (in the order they appear in the header record) in thetarred_filesattribute. Each of these objects has afile_range()method which gives the range of its file content bytes within theTarStream.- __init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, scan_headers=True, single_request=False, force_async=False, chunk_size=None, raise_response=True)[source]
Set up a stream for the ZIP archive at
url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default:Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on thetotal_bytesproperty.By default (if
clientis left asNone) a freshhttpx.Clientwill be created for each stream.The
byte_rangecan be specified as either aRangeobject, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval[start, end), as given by Python’s built-inrange.The
pruning_levelcontrols the policy for overlap handling (0will resize overlapped ranges,1will delete overlapped ranges, and2will raise an error when a new range is added which overlaps a pre-existing range).If
single_requestisTrue(default:False), then the behaviour when an emptybyte_rangeis passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each timeadd()was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly.See docs for the
handle_overlap()method for further details.
- Parameters:
client – (
httpx.Client|None) The HTTPX client to use for HTTP requestsbyte_range (Range | tuple[int, int]) – (
Range|tuple[int,int]) The range of positions on the file to be requestedpruning_level (int) – (
int) Either0(‘replant’),1(‘burn’), or2(‘strict’)scan_headers (bool) – (
bool) Whether to scan the archive headers upon initialisation and add the archive’s file rangessingle_request (bool) – (
bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.force_async (bool) – (
bool|None) Whether to require the client to behttpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)chunk_size (int | None) – (
int|None) The chunk size used for thehttpx.Response.iter_rawresponse byte iteratorsraise_response (bool) – (
bool) Whether to raise HTTP status code exceptions
- check_header_recs()[source]
Scan through all header records in the file, building a list of
TarredFileInfoobjects describing the files described by the headers (but do not download those corresponding archived file ranges).For efficiency, only look at the particular fields of interest, not the entire header each time.
- read_file_name(start_pos_offset=0)[source]
Return the file name by reading the file name for the header block starting at
start_pos_offset(which for the first file will be0, the default). Tar archives end with at least two empty blocks (i.e. 1024 bytes of padding), but there may be more than that. To catch this possibility, this method will raise a :class`StopIteration` error if the file name if NULL (i.e. if what was expected to be a file name is actually padding).- Return type:
- class range_streams.codecs.png.PngStream(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=True, force_async=False, chunk_size=None, raise_response=True, scan_ihdr=True, enumerate_chunks=True)[source]
Bases:
RangeStreamAs for RangeStream, but if scan_ihdr is True, then immediately call
scan_ihdr()on initialisation (which will perform the necessary range request to read PNG metadata from its IHDR chunk), setting various attributes on theIHDRobject.Populating these attributes can be postponed [until manually calling
scan_ihdr()andenumerate_chunks()] to avoid sending any range requests at initialisation.- __init__(url, client=None, byte_range=Range[0, 0), pruning_level=0, single_request=True, force_async=False, chunk_size=None, raise_response=True, scan_ihdr=True, enumerate_chunks=True)[source]
Set up a stream for the PNG file at
url, with either an initial range to be requested (HTTP partial content request), or if left as the empty range (default:Range(0,0)) a HEAD request will be sent instead, so as to set the total size of the target file on thetotal_bytesproperty.By default (if
clientis left asNone) a freshhttpx.Clientwill be created for each stream.The
byte_rangecan be specified as either aRangeobject, or 2-tuple of integers ((start, end)), interpreted either way as a half-closed interval[start, end), as given by Python’s built-inrange.The
pruning_levelcontrols the policy for overlap handling (0will resize overlapped ranges,1will delete overlapped ranges, and2will raise an error when a new range is added which overlaps a pre-existing range).If
single_requestisTrue(default:True), then the behaviour when an emptybyte_rangeis passed instead becomes to send a standard streaming GET request (not a partial content request at all), and instead the class will then facilitate an interface that ‘simulates’ these calls, i.e. as if each timeadd()was used the range requests were being returned instantly (as everything needed was already obtained on the first request at initialisation). More performant when reading a stream linearly, and defaults toTruein the PNG codec as chunks are read linearly.See docs for the
handle_overlap()method for further details.
- Parameters:
client – (
httpx.Client|None) The HTTPX client to use for HTTP requestsbyte_range (Range | tuple[int, int]) – (
Range|tuple[int,int]) The range of positions on the file to be requestedpruning_level (int) – (
int) Either0(‘replant’),1(‘burn’), or2(‘strict’)single_request (bool) – (
bool) Whether to use a single GET request and just add ‘windows’ onto this rather than create multiple partial content requests.force_async (bool) – (
bool|None) Whether to require the client to behttpx.AsyncClient, and if no client is given, to create one on initialisation. (Experimental/WIP)scan_ihdr (bool) – (
bool) Whether to scan the IHDR chunk on initialisationenumerate_chunks (bool) – (
bool) Whether to step through each chunk (read its metadata, and proceed until all chunks have been identified) upon initialisationchunk_size (int | None) – (
int|None) The chunk size used for thehttpx.Response.iter_rawresponse byte iteratorsraise_response (bool) – (
bool) Whether to raise HTTP status code exceptions
- property alpha_as_direct
To avoid distinguishing ‘direct’ image transparency (in IDAT) from ‘indirect’ (or computed, from tRNS) palette transparency, check for a colour map and then check for a tRNS chunk to determine overall whether this image has an alpha channel in whichever way.
- any_semitransparent_idat(nonzero=True)[source]
Whether there are any non-255 values in the alpha channel of the PNG, determined from IDAT chunk alone. If not, the alpha channel serves no purpose in practice, and the image may be considered non-transparent.
If
nonzerois True (the default), check for semitransparent, rather than nontransparent values (i.e.0 < A < 255rather than0 <= A < 255).Note: presumes
alpha_as_direct()has already been called, so the image is known to have 4 channels.- Parameters:
nonzero (
bool) – Whether to returnTrueonly if the image has ‘intermediate’ (between 0 and 255) values, otherwise whether they’re below 255.
- property bit_depth_as_direct
Indexed images may report an IHDR bit depth other than 8, however the PLTE uses 8 bits per sample regardless of image bit depth, so override it to avoid distinguishing ‘direct’ bit depth from ‘indirect’ palette bit depth.
- property channel_count_as_direct
If the image is indexed on a palette, then the channel count in the IHDR will be 1 even though the underlying sample contains 3 channels (R,G,B). To avoid distinguishing ‘direct’ image channels (in IDAT) from ‘indirect’ (or computed, from tRNS) palette channels, check for a colour map and then check for a tRNS chunk to determine overall whether this image has an extra channel for transparency.
- property chunks
‘Gate’ to the internal
_chunksattribute.If this property is called before the internal attribute is set, (‘prematurely’), to avoid an access error it will ‘proactively’ call
populate_chunks()before returning the gated internal attribute.
- enumerate_chunks()[source]
Parse the length and type chunks, then skip past the chunk data and CRC chunk, so as to enumerate all chunks in the PNG (but request and read as little as possible). Build a dictionary of all chunks with keys of the chunk type (four letter strings) and values of lists (since some chunks e.g. IDAT can appear multiple times in the PNG).
See the official specification for full details (or Wikipedia, or the W3C).
- async enumerate_chunks_async()[source]
Parse the length and type chunks, then skip past the chunk data and CRC chunk, so as to enumerate all chunks in the PNG (but request and read as little as possible). Build a dictionary of all chunks with keys of the chunk type (four letter strings) and values of lists (since some chunks e.g. IDAT can appear multiple times in the PNG).
See the official specification for full details (or Wikipedia, or the W3C).
- get_idat_data()[source]
Decompress the IDAT chunk(s) and concatenate, then confirm the length is exactly equal to
height * (1 + width * bit_depth), and filter it (removing the filter byte at the start of each scanline) usingreconstruct_idat().
- has_chunk(chunk_type)[source]
Determine whether the given chunk type is one of the chunks defined in the PNG. If the chunks have not yet been parsed, they will first be enumerated.
- Return type:
- populate_chunks()[source]
Call
enumerate_chunks()and store in the internal_chunksattribute, accessible through thechunksproperty.If the
chunksproperty is called ‘prematurely’, to avoid an access error it will ‘proactively’ call this method before returning the gated internal attribute.