Skip to content

Commit a6ec4ea

Browse files
committed
Deploy from jjohare/logseq @ 1cf34aaad371ee6bf7be705b15af2ae8130b0c57 jjohare/logseq@1cf34aa
1 parent 2c9f68f commit a6ec4ea

89 files changed

Lines changed: 495 additions & 495 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

api/pages/pages/A-Star Algorithm.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@
66
"Search Algorithms"
77
],
88
"wiki_links": [
9-
"Dijkstra's Algorithm",
10-
"Graph Theory",
11-
"Heuristic Methods",
129
"Search Algorithms",
13-
"Pathfinding",
1410
"Priority Queue",
15-
"Route Planning"
11+
"Pathfinding",
12+
"Route Planning",
13+
"Dijkstra's Algorithm",
14+
"Heuristic Methods",
15+
"Graph Theory"
1616
],
1717
"ontology": {
1818
"term_id": "AI-1004",

api/pages/pages/Account Abstraction.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
"Decentralised Web"
88
],
99
"wiki_links": [
10-
"ERC-4337",
11-
"Ethereum Standard",
1210
"BlockchainDomain",
11+
"Ethereum Standard",
12+
"ERC-4337",
1313
"Ethereum"
1414
],
1515
"ontology": {

api/pages/pages/Active Learning.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@
77
"Data Annotation"
88
],
99
"wiki_links": [
10-
"Labeling Cost",
11-
"Online Learning",
12-
"Machine Learning",
1310
"Uncertainty Sampling",
14-
"Data-Efficient Learning",
11+
"Machine Learning",
12+
"Human-in-the-Loop",
13+
"Online Learning",
1514
"Semi-Supervised Learning",
16-
"Human-in-the-Loop"
15+
"Labeling Cost",
16+
"Data-Efficient Learning"
1717
],
1818
"ontology": {
1919
"term_id": "AI-1013",

api/pages/pages/Automated Planning.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@
88
"STRIPS"
99
],
1010
"wiki_links": [
11-
"Artificial Intelligence",
12-
"STRIPS",
13-
"Planning and Scheduling",
1411
"Search Algorithms",
15-
"Formal Methods",
1612
"Autonomous Systems",
17-
"Knowledge Representation"
13+
"Formal Methods",
14+
"Knowledge Representation",
15+
"Planning and Scheduling",
16+
"STRIPS",
17+
"Artificial Intelligence"
1818
],
1919
"ontology": {
2020
"term_id": "AI-1008",

api/pages/pages/Bagging.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@
77
],
88
"wiki_links": [
99
"Ensemble Methods",
10-
"Random Forest",
1110
"Machine Learning",
12-
"Bootstrap Sampling",
1311
"Variance",
12+
"Random Forest",
1413
"Parallel Training",
15-
"Overfitting"
14+
"Overfitting",
15+
"Bootstrap Sampling"
1616
],
1717
"ontology": {
1818
"term_id": "AI-1016",

api/pages/pages/Boosting.json

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@
66
"Ensemble Methods"
77
],
88
"wiki_links": [
9-
"AdaBoost",
10-
"Gradient Boosting",
119
"Ensemble Methods",
12-
"LightGBM",
10+
"XGBoost",
11+
"CatBoost",
1312
"Machine Learning",
13+
"Gradient Boosting",
1414
"Bias",
15-
"CatBoost",
16-
"Accuracy",
17-
"XGBoost"
15+
"AdaBoost",
16+
"LightGBM",
17+
"Accuracy"
1818
],
1919
"ontology": {
2020
"term_id": "AI-1015",

api/pages/pages/Centralized Swarm Control.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
"backlinks": [],
66
"wiki_links": [
77
"RoboticsDomain",
8-
"Swarm Robotics",
9-
"Swarm Control Architecture"
8+
"Swarm Control Architecture",
9+
"Swarm Robotics"
1010
],
1111
"ontology": {
1212
"term_id": "RB-9003",

api/pages/pages/Cosmos IBC.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@
55
"backlinks": [],
66
"wiki_links": [
77
"Cosmos Network",
8-
"Layer 0 Protocol",
9-
"IBC Protocol",
8+
"BlockchainDomain",
109
"Interoperability Protocol",
11-
"BlockchainDomain"
10+
"Layer 0 Protocol",
11+
"IBC Protocol"
1212
],
1313
"ontology": {
1414
"term_id": "BC-9001",

api/pages/pages/Data Annotation.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44
"content": "- ### OntologyBlock\n - ontology:: true\n - public-access:: true\n - term-id:: AI-1020\n - preferred-term:: Data Annotation\n - source-domain:: ai\n - status:: draft\n\n### Relationships\n- is-subclass-of:: [[Data Engineering]]\n- is-subclass-of:: [[Machine Learning Pipeline]]\n- skos:related:: [[Supervised Learning]]\n- skos:related:: [[Active Learning]]\n- skos:related:: [[Human-in-the-Loop]]\n- enables:: [[Training Data]]\n- required-for:: [[Supervised Learning]]\n\n### Definition\nData annotation is the process of labeling or tagging raw data (images, text, audio, video) with meaningful, informative labels that provide context and ground truth for supervised machine learning models. It involves human annotators or semi-automated systems identifying and marking features, objects, sentiments, entities, or other attributes in data to create training datasets that algorithms can learn from.\n\n### Importance\n- Foundation of supervised learning\n- Quality determines model ceiling\n- Often the bottleneck in AI projects\n- Expensive and time-consuming (50-70% of project cost)\n- Critical for model accuracy and reliability\n- Enables evaluation and validation\n\n### Annotation Types by Data Modality\n**Image Annotation:**\n- Bounding boxes (object detection)\n- Polygons/polylines (precise boundaries)\n- Semantic segmentation (pixel-level classes)\n- Instance segmentation (individual objects)\n- Keypoint annotation (landmarks, poses)\n- Image classification tags\n- 3D cuboids (depth/orientation)\n\n**Text Annotation:**\n- Named Entity Recognition (NER) tags\n- Part-of-speech tagging\n- Sentiment labels (positive/negative/neutral)\n- Intent classification\n- Topic/category labels\n- Text span highlighting\n- Relation extraction\n- Coreference resolution\n\n**Audio Annotation:**\n- Speech transcription\n- Speaker diarization (who spoke when)\n- Emotion labeling\n- Sound event detection\n- Music instrument tagging\n- Acoustic scene classification\n\n**Video Annotation:**\n- Frame-by-frame object tracking\n- Action recognition labels\n- Event temporal boundaries\n- Scene segmentation\n- Pose tracking over time\n- Crowd counting\n\n### Annotation Methods\n**Manual Annotation:**\n- Human annotators label data\n- Highest quality but expensive\n- Domain expertise may be required\n- Inter-annotator agreement crucial\n\n**Semi-Automated:**\n- Pre-labeling with models\n- Human review and correction\n- Active learning loops\n- Faster and cheaper\n\n**Crowdsourcing:**\n- Distributed to many workers\n- Platforms: Amazon MTurk, Labelbox, Scale AI\n- Requires quality control\n- Good for simple tasks\n\n**Programmatic (Weak Supervision):**\n- Labeling functions/rules\n- Heuristics and patterns\n- Knowledge bases\n- Snorkel framework\n\n**Transfer/Self-Supervised:**\n- Use pre-trained models\n- Synthetic data generation\n- Data augmentation with labels\n\n### Annotation Tools\n**Image/Video:**\n- CVAT (Computer Vision Annotation Tool)\n- LabelImg\n- VGG Image Annotator (VIA)\n- Labelbox\n- V7 Darwin\n- Supervisely\n\n**Text:**\n- Prodigy\n- Label Studio\n- Doccano\n- Brat\n- Tagtog\n\n**Multi-Modal:**\n- Amazon SageMaker Ground Truth\n- Scale AI\n- Labelbox\n- Supervisely\n\n### Quality Assurance\n**Inter-Annotator Agreement:**\n- Cohen's Kappa\n- Fleiss' Kappa (3+ annotators)\n- Krippendorff's Alpha\n- Percentage agreement\n\n**Consensus Methods:**\n- Majority voting (multiple annotators)\n- Expert adjudication\n- Weighted voting\n- Expectation-maximization\n\n**Quality Control:**\n- Gold standard test sets\n- Random audits\n- Attention checks\n- Training and guidelines\n- Feedback loops\n\n### Annotation Guidelines\n**Essential Components:**\n- Clear definitions of labels\n- Edge case handling\n- Examples (positive and negative)\n- Decision trees for ambiguity\n- Consistency rules\n- Iterative refinement\n\n**Best Practices:**\n- Pilot annotation phase\n- Regular calibration sessions\n- Version control for guidelines\n- FAQ for common issues\n- Visual examples\n\n### Challenges\n**Subjectivity:**\n- Ambiguous cases\n- Annotator bias\n- Inconsistent interpretations\n\n**Scalability:**\n- Millions of examples needed\n- High cost per example\n- Time constraints\n\n**Quality vs. Cost:**\n- Expert annotators expensive\n- Crowdworkers variable quality\n- Balance needed\n\n**Privacy:**\n- Sensitive data (medical, financial)\n- Regulatory compliance (GDPR, HIPAA)\n- Anonymization required\n\n**Class Imbalance:**\n- Rare events expensive to find\n- Biased training data\n- Active learning helps\n\n### Cost Optimization Strategies\n1. **Active learning:** Annotate most informative examples\n2. **Transfer learning:** Use pre-trained models\n3. **Weak supervision:** Programmatic labeling\n4. **Data augmentation:** Multiply labeled examples\n5. **Semi-supervised learning:** Leverage unlabeled data\n6. **Crowdsourcing:** Scale with many workers\n7. **Pre-labeling:** Model-assisted annotation\n\n### Ethical Considerations\n- Fair compensation for annotators\n- Working conditions (gig economy issues)\n- Exposure to disturbing content (moderation)\n- Cultural sensitivity\n- Bias in annotations (reflects annotator demographics)\n- Privacy of data subjects\n\n### Emerging Trends\n**Foundation Models:**\n- Reduce annotation needs\n- Few-shot learning\n- Zero-shot capabilities\n\n**Synthetic Data:**\n- Generative models create labeled data\n- Simulation environments (robotics)\n- Reduced cost\n\n**Interactive Annotation:**\n- Human-AI collaboration\n- Iterative refinement\n- Real-time feedback\n\n**Annotation as a Service:**\n- Managed platforms (Scale AI, Labelbox)\n- End-to-end pipelines\n- Quality guarantees\n\n### Impact on Model Performance\n- **Quantity:** More data generally helps (diminishing returns)\n- **Quality:** Clean, consistent labels critical\n- **Coverage:** Diverse examples improve generalization\n- **Balance:** Class distribution affects metrics\n- **Granularity:** Label detail matches task needs\n\n### Annotation Project Workflow\n1. **Define task and labels**\n2. **Create annotation guidelines**\n3. **Pilot annotation (small batch)**\n4. **Measure inter-annotator agreement**\n5. **Refine guidelines**\n6. **Scale annotation**\n7. **Quality assurance checks**\n8. **Model training and evaluation**\n9. **Identify errors, re-annotate**\n10. **Iterate**\n\n### Metrics\n- Annotations per hour (productivity)\n- Cost per annotation\n- Inter-annotator agreement\n- Accuracy vs. gold standard\n- Coverage (% of data annotated)\n\nData annotation bridges raw data and intelligent systems, transforming unstructured information into structured knowledge that powers supervised machine learning across computer vision, NLP, speech recognition, and beyond.",
55
"backlinks": [],
66
"wiki_links": [
7-
"Data Engineering",
8-
"Supervised Learning",
9-
"Machine Learning Pipeline",
107
"Active Learning",
8+
"Human-in-the-Loop",
9+
"Machine Learning Pipeline",
10+
"Supervised Learning",
1111
"Training Data",
12-
"Human-in-the-Loop"
12+
"Data Engineering"
1313
],
1414
"ontology": {
1515
"term_id": "AI-1020",

api/pages/pages/Data Cleaning.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@
66
"Feature Engineering"
77
],
88
"wiki_links": [
9-
"Data Engineering",
109
"Machine Learning",
11-
"Data Quality",
12-
"Data Analysis",
13-
"ETL",
1410
"Feature Engineering",
15-
"Data Preprocessing"
11+
"ETL",
12+
"Data Quality",
13+
"Data Preprocessing",
14+
"Data Engineering",
15+
"Data Analysis"
1616
],
1717
"ontology": {
1818
"term_id": "AI-1018",

0 commit comments

Comments
 (0)