proboards_scraper.database¶
- class proboards_scraper.database.Database(db_path)[source]¶
This class serves as an interface for the SQLite database, and allows items to be inserted/updated or queried using a variety of specific functions that abstract away implementation details of the database and its schema.
- Parameters
db_path (
pathlib.Path
) – Path to SQLite database file.
- insert(obj, filters=None, update=False)[source]¶
Query the database for an object of the given sqlalchemy Metaclass using the given
filters
to determine if it already exists in the database. If it doesn’t, insert it into the database. Either way, return a bool indicating whether the object was added, as well as the resulting object from the query.Although this method can be called directly, it is preferable to call the corresponding insert_* or query_* wrapper methods instead, which simplify the task of querying/inserting into the database.
- Parameters
obj (
sqlalchemy.orm.decl_api.DeclarativeMeta
) –A sqlalchemy Metaclass instance corresponding to a database table class, i.e., an instance of one of:
filters (
Optional[dict]
) – A dict of key/value pairs on which to filter the query results. The keys should correspond to the attributes of the Metaclass, i.e., attributes of theobj
argument class. See example below. Iffilters
isNone
, it defaults to theid
attribute ofobj
, i.e.,obj.id
.update (
bool
) – Whether to update the database entry if the queried object already exists.
Example
The following example demonstrates how to insert a new user into the database. Note that we first create an instance of the user (which is passed to
insert()
) and filter by the user id (which is theUser
table primary key). In other words, this searches the database for an existing user with the given filter (i.e., the user with id 7) and, if the user doesn’t exist, inserts it into the database, then returns the inserted object:user_data = { "id": 7, "date_registered": 1631019126, "email": "foo@bar.com", "name": "Snake Plissken", "username": "snake", } new_user = User(**user_data) db = Database("forum.db") inserted, new_user = db.insert( new_user, filters={"id": 7} )
- Return type
Tuple[int, sqlalchemy.orm.decl_api.DeclarativeMeta]
- Returns
(inserted, ret)
- inserted:
An integer code denoting insert status.
0: The object failed to be inserted or updated.
1:
obj
was inserted into the database.2:
obj
existed in the database and was updated.
- ret:
The inserted object, if the object didn’t previously exist in the database, or the existing object if it did already exist. It is effectively an updated version of
obj
.
- insert_avatar(avatar_, update=False)[source]¶
Insert a user avatar into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Avatar
object.
- insert_board(board_, update=False)[source]¶
Insert a board into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Board
object.
- insert_category(category_, update=False)[source]¶
Insert a category into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Category
object.
- insert_guest(guest_)[source]¶
Guest users are a special case of
User
. Guests are users who do not have a user id or a user profile page, and may include deleted users. Since there may be posts or threads started by guests, they are treated as normal users for the purposes of the database, except they are assigned a negative integer user id (which does not exist on the actual forum). Because a given guest has only a username and not a user id, guests are queried by name. If a guest does not already exist in the database, the next smallest negative integer is used as their user id.- Parameters
guest_ (
dict
) – A dict containing aname
key, corresponding to the guest user’s name.- Return type
- Returns
The inserted or existing
User
object corresponding to the guest.
- insert_image(image_, update=False)[source]¶
Insert an image into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Image
object.
- insert_moderator(moderator_, update=False)[source]¶
Insert a moderator into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Moderator
object.
- insert_poll(poll_, update=False)[source]¶
Insert a poll into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Poll
object.
- insert_poll_option(poll_option_, update=False)[source]¶
Insert a poll option into the database; this method wraps
insert()
.- Parameters
poll_option_ (
dict
) – A dict containing the keyword args (attributes) needed to instantiate aPollOption
object.update (
bool
) – Seeinsert()
.
- Return type
- Returns
The inserted (or updated)
PollOption
object.
- insert_poll_voter(poll_voter_, update=False)[source]¶
Insert a poll voter into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
PollVoter
object.
- insert_post(post_, update=False)[source]¶
Insert a post into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Post
object.
- insert_shoutbox_post(shoutbox_post_, update=False)[source]¶
Insert a shoutbox post into the database; this method wraps
insert()
.- Parameters
shoutbox_post_ (
dict
) – A dict containing the keyword args (attributes) needed to instantiate aShoutboxPost
object.update (
bool
) – Seeinsert()
.
- Return type
- Returns
The inserted (or updated)
ShoutboxPost
object.
- insert_thread(thread_, update=False)[source]¶
Insert a thread into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
Thread
object.
- insert_user(user_, update=False)[source]¶
Insert a user into the database; this method wraps
insert()
.- Parameters
- Return type
- Returns
The inserted (or updated)
User
object.
- query_boards(board_id=None)[source]¶
Return a list of all boards if no
board_id
is provided or a specific board if it is provided.- Parameters
board_id (
Optional[int]
) – A board id (optional).- Return type
Union[List[dict], dict]
- Returns
A dict corresponding to a board in the database (if
board_id
was provided), else a list of dicts of all boards (ifboard_id
was not provided).
Note
The returned
Board
object(s) are serialized to a human-readable JSON format (Python dict) byserialize()
.
- query_threads(thread_id=None)[source]¶
Return a list of all threads if no
thread_id
is provided or a specific thread if it is provided.- Parameters
thread_id (
Optional[int]
) – A thread id (optional).- Return type
Union[List[dict], dict]
- Returns
A dict corresponding to a thread in the database (if
thread_id
was provided), else a list of dicts of all threads (ifthread_id
was not provided).
Note
The returned
Thread
object(s) are serialized to a human-readable JSON format (Python dict) byserialize()
.
- query_users(user_id=None)[source]¶
Return a list of all users, if no
user_id
is provided, or a specific user, if it is provided.- Parameters
user_id (
Optional[int]
) – A user id (optional).- Return type
Union[List[dict], dict]
- Returns
A dict corresponding to a user in the database (if
user_id
was provided), else a list of dicts of all users (ifuser_id
was not provided).
Note
The returned
User
object(s) are serialized to a human-readable JSON format (Python dict) byserialize()
.
- class proboards_scraper.database.Avatar(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table links a user to their avatar.
- class proboards_scraper.database.Board(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table contains information on boards and their associated metadata.
- id¶
Board number obtained from the board URL, eg,
https://yoursite.proboards.com/board/42/general
refers to the “General” board with id 42.- Type
int
- description¶
Board description.
- Type
str
- name¶
Board name. Required.
- Type
str
- parent_id¶
Board id of this board’s parent board, if it is a sub-board.
- Type
int
- password_protected¶
Whether the board is password-protected.
- Type
bool
- url¶
Board URL.
- Type
str
- sub_boards¶
List of this board’s sub-boards, if any.
- class proboards_scraper.database.Category(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table stores information on categories (on the main page) and their associated metadata.
- id¶
Category id number obtained from the main page source.
- Type
int
- name¶
Category name.
- Type
str
- class proboards_scraper.database.CSS(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
Table for storing information related to downloaded CSS files. The CSS files themselves should be stored on disk.
- id¶
An arbitrary autoincrementing primary key for each CSS file.
- Type
int
- description¶
Description of the file.
- Type
str
- filename¶
Filename of the CSS file stored on disk.
- Type
str
- md5_hash¶
MD5 hash of the downloaded file.
- Type
str
- url¶
Original URL of the CSS file.
- Type
str
- class proboards_scraper.database.Image(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table stores generic metadata for any image. Image files themselves should be downloaded and stored somewhere; this table only records the filename of the downloaded file (which may differ from the original filename, found in the url). The table may also be used to store metadata on files that no longer exist, e.g., an avatar hosted on a site that no longer exists, as a record of the original URL.
- id¶
An arbitrary autoincrementing primary key for each image.
- Type
int
- description¶
Description of the image. Optional.
- Type
str
- filename¶
Filename of the downloaded file on disk.
- Type
str
- md5_hash¶
MD5 hash of the downloaded file.
- Type
str
- size¶
Size, in bytes, of the downloaded file.
- Type
int
- url¶
Original URL of the file.
- Type
str
- class proboards_scraper.database.Moderator(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table links a user to a board they moderate. A given moderation relationship (i.e., board + user combination) must be unique.
- class proboards_scraper.database.Poll(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table stores information on a poll associated with a thread. Specifically, it links the poll id (which is the same as the thread id) to the options for the poll and the users who have voted in the poll.
- name¶
Poll name, i.e., the poll question.
- Type
str
- options¶
List of options associated with this poll; see
PollOption
.
- class proboards_scraper.database.PollOption(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table stores the number of votes for a poll option. A poll option must have an associated poll. Note that poll option ids are unique across the entire site, so we don’t need to create an arbitrary autoincrementing primary key and can simply use the integer value found on the forum itself.
- id¶
Poll option (answer) id obtained from scraping the site.
- Type
int
- name¶
Option name.
- Type
str
- votes¶
Number of votes this option received.
- Type
int
- class proboards_scraper.database.PollVoter(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table links a poll to users who have voted in the poll. Note that we can only see who has voted on a poll but not which option (
PollOption
) they voted for. Each row in the table corresponds to a unique poll/user combination, since a user can vote, at most, only once in a given poll.
- class proboards_scraper.database.Post(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table holds information for each post.
- id¶
Post id, obtained from the forum. Every post on a forum has a unique integer id.
- Type
int
- date¶
When the post was made (Unix timestamp).
- Type
int
- edit_user_id¶
User id of the user who made the last edit, if any; see
User
. If never edited, this value will be null.- Type
int
- last_edited¶
When the post was last edited (Unix timestamp), if at all. If never edited, this value will be null.
- Type
int
- message¶
Post content/message, including any raw HTML.
- Type
str
- url¶
Original post URL.
- Type
str
- class proboards_scraper.database.ShoutboxPost(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table holds information for each shoutbox post.
- id¶
Autoincrementing primary key.
- Type
int
- date¶
When the post was made (Unix timestamp).
- Type
int
- message¶
Post content/message, including any HTML.
- Type
str
- class proboards_scraper.database.Thread(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table stores information for each thread.
- id¶
Thread id, obtained from the thread URL.
- Type
int
- announcement¶
Whether the thread is marked as an announcement.
- Type
bool
- locked¶
Whether the thread is locked.
- Type
bool
- title¶
Thread title.
- Type
str
- url¶
Original URL.
- Type
str
- sticky¶
Whether the thread is stickied.
- Type
bool
- views¶
Number of thread views.
- Type
int
- class proboards_scraper.database.User(**kwargs)[source]¶
Bases:
sqlalchemy.orm.decl_api.Base
This table holds information on users obtained from their user profile.
- id¶
User number obtained from the user’s profile URL, eg,
https://yoursite.proboards.com/user/21
refers to the user with user id 21. A negative value indicates a “guest” or deleted user and does not refer to an actual user id.- Type
int
- age¶
User age. Optional.
- Type
int
- birthdate¶
User birthdate string. Optional.
- Type
str
- date_registered¶
Unix timestamp.
- Type
int
- email¶
User email.
- Type
str
- instant_messengers¶
Optional; a string consisting of semicolon-delimited messenger_name:screen_name pairs, eg,
"AIM:ssj_goku12;ICQ:12345;YIM:duffman20"
.- Type
str
- gender¶
Optional (“Male”/”Female”/”Other”).
- Type
str
- group¶
Group/rank (eg, “Regular Membership”, “Global Moderator”).
- Type
str
- last_online¶
Unix timestamp.
- Type
int
- latest_status¶
User’s latest status. Optional.
- Type
str
- location¶
User location. Optional.
- Type
str
- name¶
Display name.
- Type
str
- post_count¶
Number of posts (scraped from the user’s profile page); to get the actual number of posts by the user in the database, use the
posts
attribute.- Type
int
- signature¶
User signature. Optional.
- Type
str
- url¶
Original user profile page URL.
- Type
str
- username¶
Registration name.
- Type
str
- website¶
User website. Optional.
- Type
str
- website_url¶
User website URL. Optional.
- Type
str
- proboards_scraper.database.serialize(obj)[source]¶
Helper function that recursively serializes a database table object (or list of objects) and returns them as Python dictionaries.
- Parameters
obj (
Union[sqlalchemy.orm.decl_api.DeclarativeMeta, list]
) –A sqlalchemy Metaclass instance, i.e., one of:
Returns: Serialized version of the object (or list of objects).
- Return type
Union[dict, List[dict]]