diff --git a/docs/build.md b/docs/build.md index 34a9382..793a5b0 100644 --- a/docs/build.md +++ b/docs/build.md @@ -1,6 +1,7 @@ # Build steps (First time) -Note: Replace BUILD_TYPE with either `Release` or `Debug` depending on your requirements, If you are running this on the system you are building on, use `System` instead +Note: Replace BUILD_TYPE with either `Release` or `Debug` depending on your requirements, If you are running this on the +system you are building on, use `System` instead ``` git clone https://github.com/KJNeko/FGLEngine.git --recursive @@ -9,6 +10,7 @@ cmake --build build -j --target IDHANServer ``` If you want to also build the HydrusImporter you can do + ``` cmake --build build -j --target IDHANServer HydrusImporter ``` @@ -24,3 +26,7 @@ git submodule update --init --recursive cmake -DCMAKE_BUILD_TYPE=System -B build cmake --build build -j --target IDHANServer ``` + +# [Getting started](setup.md) + +Now you can get started setting up IDHAN \ No newline at end of file diff --git a/docs/config.md b/docs/config.md index 9c5f7df..bc6bd3c 100644 --- a/docs/config.md +++ b/docs/config.md @@ -19,7 +19,8 @@ IDHAN will search for config information in top-to-bottom order. -All config options can be provided in ENV variables if they are in the toml, the format is `IDHAN_$(GROUP)_$(NAME)`, `IDHAN\_` is +All config options can be provided in ENV variables if they are in the toml, the format is `IDHAN_$(GROUP)_$(NAME)`, +`IDHAN\_` is use to prevent accidental environment collisions ## Linux diff --git a/docs/docker.md b/docs/docker.md index a196629..4d5f575 100644 --- a/docs/docker.md +++ b/docs/docker.md @@ -18,6 +18,7 @@ port = 8080 would be `IDHAN_SERVER_PORT=8080` # Example docker-compose + You should likely be competent enough to understand how to use this ``` diff --git a/docs/idea.md b/docs/idea.md index 92c80b4..69aea0d 100644 --- a/docs/idea.md +++ b/docs/idea.md @@ -1,11 +1,13 @@ This document contains a lot of the key ideas and concepts for the current and future development.\ -Note that this document might not match the current state of IDHAN and is mostly a guide for myself to stay on track that this document might not match the current state of IDHAN and is mostly a guide for myself to stay on track. +Note that this document might not match the current state of IDHAN and is mostly a guide for myself to stay on track +that this document might not match the current state of IDHAN and is mostly a guide for myself to stay on track. The application/concept name is **IDHAN** (**I** **D**on't **H**ave **A** **N**ame). (Better name maybe pending) # Notes -- Anything marked with **(P)** means that the idea is planned however will very likely not be implemented for the first version of IDHAN. +- Anything marked with **(P)** means that the idea is planned however will very likely not be implemented for the first + version of IDHAN. # Key terminology @@ -33,13 +35,16 @@ A record can be linked to many different things depending on what it needs to re # Tags -Tags are made up of two components, A namespace, And a subtag. Namespaces are used as a way to group various tags together simply. +Tags are made up of two components, A namespace, And a subtag. Namespaces are used as a way to group various tags +together simply. The subtag is the main visible component of a tag, It is used to represent information of the media being tagged. These two components are represented as `namespace_id` and `subtag_id` both of which use the type `std::uint32_t`. They are combined into a tag that is given the id `tag_id` of the type `std::uint64_t` -Tags are also assigned a 'domain', Domains are used to group tags to allow easy data manipulation at scale, An example of this would be creating a domain that is for tags that are identified by an AI model. This would prevent the model from possibly messing up tags that might be from a remote tag set, Which would also be given its own domain. +Tags are also assigned a 'domain', Domains are used to group tags to allow easy data manipulation at scale, An example +of this would be creating a domain that is for tags that are identified by an AI model. This would prevent the model +from possibly messing up tags that might be from a remote tag set, Which would also be given its own domain. Tags are always displayed and stored as lowercase. @@ -54,7 +59,8 @@ Examples of a subtag: - `skirt` - `blue eyes` -Tags components are seperated by a `:` character. In the event that a namespace is 'empty' or blank, then there is no seperation character. And only the subtag should be displayed. +Tags components are seperated by a `:` character. In the event that a namespace is 'empty' or blank, then there is no +seperation character. And only the subtag should be displayed. Examples of completed tags: @@ -62,7 +68,8 @@ Examples of completed tags: - `series:highschool dxd` - `catgirl` (Notice the lack of a namespace) -For inputs, The first separation character (`:`) is used. An example of this is in the case of `series:re:zero` the namespace is `series` and the subtag is `re:zero` +For inputs, The first separation character (`:`) is used. An example of this is in the case of `series:re:zero` the +namespace is `series` and the subtag is `re:zero` ### Aliases @@ -75,67 +82,42 @@ Any attempt of aliasing an already aliased id should result in an error. ### Parents/Child -In some cases a tag might always be associated with another tag. The tags would almost always be together. In this case a parent/child relationship can be made. This relationship dictates that a child cannot be without it's parents. A tag can have any number of parents. An example of this would be: the tag `pussy` with the parent tag `rating:explicit`. +In some cases a tag might always be associated with another tag. The tags would almost always be together. In this case +a parent/child relationship can be made. This relationship dictates that a child cannot be without it's parents. A tag +can have any number of parents. An example of this would be: the tag `pussy` with the parent tag `rating:explicit`. ### Siblings -Siblings are different from Hydrus, In IDHAN they work as an exclusive tagging. The best example of this is rating tags. The tag `rating:safe` should obviously never be with the tag `rating:explicit`. As something can't be both. This prevents that from happening. In this relationship two tags are designated as 'siblings', One being the 'older sibling', the other being the 'younger sibling'. If both siblings are present, Then the older one is presented while the younger one is hidden. +Siblings are different from Hydrus, In IDHAN they work as an exclusive tagging. The best example of this is rating tags. +The tag `rating:safe` should obviously never be with the tag `rating:explicit`. As something can't be both. This +prevents that from happening. In this relationship two tags are designated as 'siblings', One being the 'older sibling', +the other being the 'younger sibling'. If both siblings are present, Then the older one is presented while the younger +one is hidden. ### Order of application (Probably a shitty name for this) Tags are 'solved' in the following order -- parents/childs and siblings are flattened. This means that all tags are transformed into their alias tag, or 'idealised'. +- parents/childs and siblings are flattened. This means that all tags are transformed into their alias tag, or ' + idealized'. - parent/child tags are then applied. -- Finally siblings are applied. +- Finally, siblings are applied. - When siblings are applied a parent that is younger than an older tag is hidden, If the child is supposed to be hidden, then all parents are also hidden or removed, Even if both tags were present on the record initally. Things to note: -- If child A and parent B are on a record with the parent being added due to child A existing, and parent B is the younger tag of an exclusive or tag, then child should remain with the parent removed. If the child is the younger tag and should be hidden, then the parent should not be displayed. +- If child A and parent B are on a record with the parent being added due to child A existing, and parent B is the + younger tag of an exclusive or tag, then child should remain with the parent removed. If the child is the younger tag + and should be hidden, then the parent should not be displayed. # File info -A record doesn't have to exist with a file info, But it's very likely to have one. -Internally there are a few generic enum values to represent what the file is. This is mainly used for hints on how to handle a file. - -Enum values: - -| Name | Value | Description | -|------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Media File | `0b0001` | A Media file is the most common type, This can be a file such as a image or video. Examples: PNG, JPG, MPV, MP4, Ect | -| Generator | `0b0010` | This is a file that contains information to generate other files. An example of this is a PSD or Krita art file. | -| Generated | `0b0100` | This marker indicates that the file has been generated by a generator and can be replicated again. The config for generating this file should be stored and which record it was generated from | -| Virtual | `0b1000` | This is the same as Generated however the file itself is not stored in any file cluster, It should be generated again when needed | - -This information set also contains basic information about the file: - -- media size (byte size of the media, even virtual files will contain this after being generated) -- store time. The time in which a file was placed into a cluster, if the file is virtual this is the same as the generator store time. - # File parsing -Even if IDHAN can identify a file, it will ignore any MIME types not registered to be handled by IDHAN. -There are defaults registered by default such as JPG and PNG and other common formats.\ -A full list can be found here:(TODO: Put the list here) - -## Importing methods - -IDHAN will parse files that it has been given to import through the following methods: - -### Internal parser - -IDHAN will use an internal MIME parser that works based on byte signatures. -Each MIME will be given a set of signatures and their expected locations, and a priority, -The priority is to allow differentiation between APNG and PNG. - -### Python scripts **(P)** - -IDHAN will allow for python scripts to be used in order to identify the mime type of file, -This will happen after the internal parser has run and found no results, Or if IDHAN has been specified to use the script after a specific parser hit.\ -IDHAN will supply a series of helper functions in python that can be imported in python to assist with giving back the response IDHAN expects.\ -These scripts will also be capable of returning file information back to IDHAN that can help with understanding new filetypes that IDHAN has never seen before without adding handling in the source code itself. +IDHAN is made to be expandable. As such file parsing is fairly dynamic, MIME parsers can be added via a json file that +teaches IDHAN how to understand a file. And thumbnailers can be added to help IDHAN generate thumbnails to serve to +other viewer tools # Scraping **(P)** @@ -152,17 +134,24 @@ There are a few key config options for it - The path to store the files - What files are allowed. (files, thumbnails, generators, generated, ect) -- The ratio of which file should be stored ( This value will be an integer in most cases and will be represented as a percent of `N / TotalN` ) +- The ratio of which file should be stored ( This value will be an integer in most cases and will be represented as a + percent of `N / TotalN` ) - The max byte size that the cluster should reach. # Collections **(P)** -In IDHAN a collection is simply to tie multiple records together. A collection can be made to inherit all the tags of its members and even have tags of its own. +In IDHAN a collection is simply to tie multiple records together. A collection can be made to inherit all the tags of +its members and even have tags of its own. The tags of a collection follow the same tags as individual records. # Generators -Generators are used in order to provide a way for IDHAN to keep a record of a source of multiple files, Examples of this would be the ability to store a single PSD and generate variants of an image without needing to store the variants. -Generators will be capable of being defined by a python script that will register itself to handle specific files. Upon doing so each file that the generator can apply too will be scanned and processed through the generator's pre-processing stage. This will collect information that can be used to configure the generation process by the user. +Generators are used in order to provide a way for IDHAN to keep a record of a source of multiple files, Examples of this +would be the ability to store a single PSD and generate variants of an image without needing to store the variants. +Generators will be capable of being defined by a python script that will register itself to handle specific files. Upon +doing so each file that the generator can apply too will be scanned and processed through the generator's pre-processing +stage. This will collect information that can be used to configure the generation process by the user. -An example of this generator system would be the ability to register a PSD file that contains layers that allow for variants of an image to be created, Such as differences in clothing depending on enabled layers. The API will expose a way to configure the data for generation that is defined by the python script. \ No newline at end of file +An example of this generator system would be the ability to register a PSD file that contains layers that allow for +variants of an image to be created, Such as differences in clothing depending on enabled layers. The API will expose a +way to configure the data for generation that is defined by the python script. \ No newline at end of file diff --git a/docs/setup.md b/docs/setup.md new file mode 100644 index 0000000..8faf157 --- /dev/null +++ b/docs/setup.md @@ -0,0 +1,63 @@ +# Getting started + +## Config + +The first thing you'll want to do is set up some configs + +Note: Each config will be listed as `[group] name`, this can either be set via an ENV variable `IDHAN_GROUP_NAME` or in +one of the toml files + +Example: + +```toml +[group] +name = value +``` + +### Thumbnails + +The main thing to set up here will be where IDHAN will place thumbnails it's generated via one of the thumbnail +generators. To do so you'll want to set a path for `[thumbnails] path` + +### Postgres + +The main values to set here will be +(all are in group `[database]`) + +- `host` - The host or ip of the pg database +- `user` - The user to attempt to sign in with +- `password` - the password to try to use +- `database` - The database name + +IDHAN will create a public schema if it does not exist, as well as all the other tables. Note that if you are starting +from scratch, Completely wipe the schema that was created last time, As IDHAN will also create some functions that won't +be wiped if you just drop all the tables. + +## Creating your first cluster + +###### If you are coming from Hydrus, This is just a file path set in the 'database' management stuff. + +A cluster is a file location that IDHAN can put files, It will expect all files placed there to be named with their hash +and have an extension, It will ONLY make changes or place files if it is NOT set to readonly, which is the default for +any new cluster during creation. + +To easily make a cluster without any 3rd-party tool. You can start the server and access the swagger api docs via +`/docs` If this does not result in a valid webpage or has errors, See the troubleshooting guide + +One you've opened the swagger docs, go down to `clusters` and find `/clusters/add`, Expand it and hit `try it out` +from there you can then change the template json to fit your requirements. If the creation succeeds you should see a +result that contains much of the information you've just entered, as wel as some extra info along with a cluster_id. +Once you've done that you can then hit the scan endpoint (Read below) + +## Scanning a cluster + +To scan a cluster you can use the `/clusters/{id}/scan` endpoint. However, please read the swagger docs for the various +parameters you can enter. The defaults *should* work for the most part, If the cluster was not created as readonly it +might make some changes to the structure without asking. If this is a worry for you and you have NOT set the cluster to +readonly. You can either modify it to be readonly, or scan with `force_readonly=true` as one of the query parameters. + +## Tagging/Getting files + +There are too many things to list here for a simple getting started guide, Please see the swagger docs for the various +tag endpoints. Note that files will NOT be returned in a search UNLESS they've been scanned in a cluster first. Even if +tagged. \ No newline at end of file