ratansunpy.scrapper package¶
Submodules¶
ratansunpy.scrapper.scrapper module¶
- class ratansunpy.scrapper.scrapper.Scrapper(baseurl: str, regex_pattern: str | None = None, condition: Callable[[str, str, str], str] | None = None, filter: Callable[[str], bool] | None = None, **kwargs: Any)[source]¶
Bases:
object
- check_date_in_timerange_from_file_date(file_date: str, timerange: TimeRange) bool [source]¶
Check if a given file date is within the specified time range.
- Parameters:
file_date – The file date as a string (format: “%Y-%m-%d”).
timerange – The TimeRange object representing the time range.
- Returns:
True if the date is within the range, False otherwise.
- check_date_in_timerange_from_url(url: str, timerange: TimeRange) bool [source]¶
Check if the date extracted from a URL is within the given time range.
- Parameters:
url – The URL string.
timerange – The TimeRange object representing the time range.
- Returns:
True if the date is within the range, False otherwise.
- extract_date_from_url(url)[source]¶
Extract date from a given URL based on the base URL’s pattern.
- Parameters:
url – The URL string.
- Returns:
The extracted Time object.
- static floor_datetime(date: Time, timestep: relativedelta) datetime [source]¶
Floor the given datetime to the nearest significant time unit.
- Parameters:
date – The Time object to floor.
timestep – The relativedelta object representing the smallest significant time unit.
- Returns:
The floored datetime object.
- form_fileslist(timerange: TimeRange) List[str] [source]¶
Retrieve a list of files from an HTTP or FTP server within the specified time range.
- Parameters:
timerange (TimeRange) – The TimeRange object representing the time range.
- Returns:
A list of file URLs.
- Example:
usage example based on SWPC Solar Region Summary (FTP server)
>>> base_url_SRS = r'ftp://ftp.ngdc.noaa.gov/STP/swpc_products/daily_reports/solar_region_summaries/%Y/%m/%Y%m%dSRS.txt' >>> scraper = Scrapper(base_url_SRS) >>> t = TimeRange('2021-10-12', '2021-10-12') >>> print(t) (2021-10-12 00:00:00, 2021-10-12 00:00:00)
>>> for url in scraper.form_fileslist(t): >>> print(f'SRS url: {url}') SRS url: ftp://ftp.ngdc.noaa.gov/STP/swpc_products/daily_reports/solar_region_summaries/2021/10/20211012SRS.txt
- Example:
usage example based on RATAN (HTTP server)
>>> if int(year) < 2010 or (int(year) == 2010 and int(month) < 5): >>> return f'{year[:2]}{date_match[:-4]}-{date_match[-4:-2]}-{date_match[-2:]}' >>> else: >>> f'{date_match[:-4]}-{date_match[-4:-2]}-{date_match[-2:]}' >>> base_url_RATAN = 'http://spbf.sao.ru/data/ratan/%Y/%m/%Y%m%d_%H%M%S_sun+0_out.fits' >>> regex_pattern_RATAN = '((\d{6,8})[^0-9].*[^0-9]0_out.fits)' >>> scraper = Scrapper(base_url_RATAN, regex_pattern=regex_pattern_RATAN, condition=build_date) >>> t = TimeRange('2010-01-13', '2010-01-13') >>> for url in scraper.form_fileslist(t): >>> print(f'RATAN url: {url}') RATAN url: http://spbf.sao.ru/data/ratan/2010/01/100113sun0_out.fits
- ftpfiles(timerange: TimeRange) List[str] [source]¶
Retrieve a list of files from an FTP server within the specified time range.
- Parameters:
timerange – The TimeRange object representing the time range.
- Returns:
A list of file URLs.
- httpfiles(timerange: TimeRange) List[str] [source]¶
Retrieve a list of files from an HTTP server within the specified time range.
- Parameters:
timerange – The TimeRange object representing the time range.
- Returns:
A list of file URLs.
- range(timerange: TimeRange) List[str] [source]¶
Generate a list of directories within the time range based on the smallest significant pattern.
- Parameters:
timerange – The TimeRange object representing the time range.
- Returns:
A list of directory paths.
- static smallest_significant_pattern(pattern: str) relativedelta | None [source]¶
Determine the smallest significant pattern (e.g., seconds, minutes, days) in the given pattern. Some of them are here: https://fits.gsfc.nasa.gov/iso-time.html
- Parameters:
pattern – The pattern string.
- Returns:
The smallest significant relativedelta object, or None if not found.